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' STREPTOCOCCUS PNEUMONIAE KNOCKOUT MUTANTS 

All dociiments cited herein are incorporated by reference in their entirety. 
TECHNICAL FIELD 

This mvention relates to mutants of the bacterium Streptococcus pneumoniae ('pneumococcus'), and 
to the use of pneumococcal proteins in screening methods. 

BACKGROUND ART 

Streptococcus pneumoniae is a Gram-positive spherical bacterium. It is the most common cause of 
acute bacterial meningitis in adults and in children over 5 years of age. 

It is an object of the mvention to provide materials for improving the prevention, detection and 
10 treatment of S.pneumoniae mfections. More specificaUy. it is an object of the mvention to provide 
mutants of Spneumoniae in which specific genes have been inactivated, and to provide specific 
genes and gpne products from S.pneumoniae for use as targets for anti-pneumococcal drugs. 

DISCLOSURE OF THE INVENTION 

Genome sequences of several strains of S.pneumoniae are available, including those of 23F [1], 670 
15 [2], R6 [3,4] and TIGR4 [5. 6]. Functional annotations of inferred coding sequences wiHan these 
genome sequences are also available. Knowledge of sequence and/or annotation, however, does not 
necessarily reveal the importance of a gene product in the life cycle of pneumococcus, or the 
suitability of the gene product as a target for pharmaceutical mtervention. 

In the Spneumoniae TIGR4 strain, 91 genes (see Table 1) have been identified which, when knocked 
20 out, result in a lethal phenotype. A further 10 genes (Table 2) have been identified which, when 
knocked out, result in poor growth characteristics when cultured in the absence of blood. These 101 
genes are essential to bacterial growth and are thus useful antibiotic targets. 

Nomenclature 

As mentioned above, genome sequences of several strams of Spneumoniae are available. Genes are 
25 referred to below by a name "SPwiwi", which refers to the gene numbering assigned to the TIGR4 
strain by Tettelin et al. [6]. This numbering unambiguously identifies any particular gene in the 
TIGR4 strain, and the gene's sequence and chromosomal location from the TIGR4 genome can 
readily be used to identify the corresponding gene in any other strain of S.pneumoniae. For ease of 
reference, the corresponding gene m flie R6 ^ome [4] is also indicated. 

30 Knockout bacteria 

The invention provides a S.pneumoniae bacterium in which ejqiression of one or more of tiie genes 
listed in Tables 1 & 2 has been knocked out 

Techniques for gene knockout are well known, and knockout mvrtants of S.pneumoniae have been 
reported previously [e.g. refe. 7-10 eto.], 
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Hie knockout is preferably achieved using isogenic deletion of the coding region, but any other 
suitable technique may be used e.g. deletion or mutation of the promoter, deletion or mutation of the^^^ 
start codon, antisense inhibition, inhibitoiy RNA, etc. In the resulting bacterium, however, mRNA^^ 
encoding the gene product of Tables 1 & 2 will be absent and/or its translation will be inhibited (e.g. 
5 to less than 1 % of wild-type levels). 

The bacterium may contain a marker gene in place of the knocked out gene e.g. an antibiotic 
resistance marker. 

Screening methods 

The invention provides a process for determining whether a test compound down-regulates 
10 expression of a target polypeptide, comprising die steps of: (a) contacting die test compound with a 
S.pnetanoniae bacterium to form a mixture; (b) incubating the mixture to allow the compound and 
the bacterium to interact; and (c) determining whether expression of the target polypeptide is 
down-regulated The compound may act by inhibiting transcription or translation. 

The invention also provides a process for determining whether a test compound binds to a target 
IS polypeptide, comprising the steps of: (a) contacting the test compound with the target polypeptide to 
form a mixture; (b) incubating the mixture to allow the compound and the target polypeptide to 
interact; and (c) determining whether the compound and polypeptide interact 

Where a target polypeptide is an enzyme, the invention also provides a process for determining 
whether a test compoimd inhibits the enzymatic activity of a target polypeptide, comprising the steps 
20 of: (a) contacting the test compoimd witii the target polypeptide and a substrate for the enzymatic 
reaction catalysed by the target polypeptide; (b) incubating the mixture to allow the compound, target 
polypeptide and substrate to interact; and (c) determining whether modification of the substrate by 
the enzymatic activity is inhibited by the test compound. 

The target polypeptide is preferably a S.pneumoniae polypeptide, and more preferably it is a 
25 S.pnetanoniae polypeptide encoded by of one of the genes listed in Table 1 or Table 2 (or a 
polypeptide as specified in the middle column of Table 1 or Table 2). The polypeptide may be &om 
any suitable strain e.g. encoded by the polA gene fix>m tiie 23F strain. The availability of sequence 
information for each of the genes listed in Tables 1 and 2 means tiiat the skilled person will readily 
be able to identify a gene of interest in any strain of interest, if that identification has not already 
30 been made. For example, the sequence of die nadE gene fi^m strain R6 (SPR1276) helps the skilled 
person to find the nadE gene in any other strain. 

As an altemative, the target polypeptide comprises (a) an amino acid sequence having sequence 
identity to the amino acid sequence encoded by of one of the genes listed in Tables 1 & 2 and/or (b) 
an amino acid sequence comprising a fi'agment of the amino acid sequence encoded by of one of die 
35 genes listed in Tables 1 & 2. The polypeptide preferably retains the activity listed in Tables 1 & 2. 
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The degree of sequence identity is preferably greater than 50% {e.g. 60%. 70%, 80%, 90%, 95%, 
99% or more). These proteins include homolo^, orlhologs, allelic variants and functional mutants of 
the Table 1 polypeptides. Identity between proteins is preferably determined by the Smith-Waterman 
homology search algorithm as implemented in the MPSRCH program (Oxford Molecular), using an 
aflSne gap search with parameters gap open penalty=12 and gcp extension penalty^l. 

The fragment should comprise at least n consecutive amino acids from the sequences and, depending 
on tiie particular sequence, « is 7 or more (e.g. 8, 10, 12. 14, 16, 18, 20, 30, 40, 50, 60, 70. 80, 90. 
100 or more). Preferably the fragment comprises one or more epitopes from the sequence. The 
fragment may be a Table 1 polypeptide without one or more of its N-terminal amino acids e.g. 
10 lacking the N-terminus methionine and/or the N-terminus signal peptide. 

As a frmfaer alternative, tiie polypeptide may be the homolog of a Table 1 polypeptide from anotiier 
Streptococcus (such as S.pyogenes or S.agalactiae) or from another Gram-positive bacterium. 

Polypeptides for use in the process of the invention can be prepared by various means (e.g. 
recombinant ejqnression, purification from S.pneumoniae, chemical synthesis, etc.) and in various 
15 forms {e.g. native, fiisions, non-glycosylated, etc.). As reagents, they are preferably used in 
substantially pure form (i.e. substantially free from other streptococcal or host cell proteins). The 
polypeptide may be immobiUsed on a support, either covalentiy or non-covalentiy. Polypeptides can 
be coated directiy onto supports, or can be attached indirectiy e.g. by the use of non-neutralising 
antibodies which are themselves attached to the support 

20 The test compound may be of extracellular, intracellvdar, biologic or chemical ori^. Typical test 
compounds include peptide, peptoids, lipids, nucleotides, nucleosides, small organic molecules, 
antibiotics, polyamines, polymers, or derivatives thereof. Small organic molecules have a molecular 
weight of between 50 and 2500 Da, and most preferably between about 300 and about 800 Da. 

The test compound may be in a purified form, or may be part of a imxture of substances, such as 
25 extracts contuning natural products, or the products of mixed combinatorial syntheses. Test 
compounds may be derived from large libraries of synthetic or natural compounds. For instance, 
synthetic compound libraries are commercially available, as are libraries of natural con^unds m the 
form of bacterial, fungal, plant and animal extracts. If a mbcture is found to have a usefol activity 
then that activity can then be traced to specific component(s) eiflier by knowing the components and 
30 testing them mdividually. or by purification or deconvolution. Additionally, test compounds may be 
synthetically produced using combinatorial chemistry either as individual compounds or as mfactures. 

The screening method of tiie invention is preferably arranged in a high-tiiroughput format 
Convenientiy, the method is performed in a microtitre plate. 

If a test compound binds to a protein of the invention and this binding inhibits the life cycle of the 
35 S.pneumoniae bacterium, then tiie test compound can be used as an antibiotic or as a lead compound 
for the design of antibiotics. 
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Methods for detecting down-regulation of transcription are well known in the art, and the method of 
detection is not critical to the invention. Methods for detecting mKNA include, but are not limited to^^ 
amplification assays such as quantitative RT-PCR, and/or hybridisation assays such as Northern^ 
analysis, dot blots, slot blots, in situ hybridisation, DNA assays, microarray, etc. 

5 Metiiods for detecting down-regulation of translation are also well known in the art and, again, the 
method of detection is not critical to the invention. Methods of polypeptide detection include, but are 
not limited to, immunodetection me&ods such as Westem blots, ELISA assays, polyacrylamide gel 
electrophoresis, mass spectroscopy, and enzymatic assays. 

Methods for detecting a binding interaction are well known in the art and may involve techniques 
10 such as NMR, filter-binding assays, gel-retardation or gel-shift assays, displacement assays, westem 
blots, radiolabeled competition assays, co-fi^ctionation by chromatography, co-precipitation, cross 
linking, surface plasmon resonance, reverse two-hybrid, etc. A compound which is found to bind to a 
polypeptide can be tested for antibiotic activity by contacting the compound with S.pneumoniae (or 
another bacterium) and then monitoring for inhibition of growth. 

15 Direct methods for detecting a binding interaction may involve a labelled test compound and/or 
polypeptide. The label may be a fluorophore, radioisotope, or other detectable label. Association of 
the label with the polypeptide indicates a binding interaction. Other direct methods for assessing 
interaction between tiie test compound and a target polypeptide may include using NMR to 
determine whether a polypeptidercompound complex is present 

20 Another method of assessing interaction between a polypeptide and a test compoimd may involve 
immobilising the polypeptide on a solid surface and assaying for the presence of fi«e test compound* 
If there is no interaction between the test compound and the polypeptide then fi^ee test compound ynW 
be detected. The test compound may be labelled to facilitate detection. This type of assay may also 
be carried with the test compound being immobilised on the solid surface. Interaction between the 

25 immobilised polypeptide and the firee test compound may also be monitored by a process such as 
surface plasmon resonance. 

Methods for assessing inhibition of enzymatic activity are well known [e.g. ref. 11]. Enzyme 
substrates are widely available fix>m commercial manufacturers, including those adapted for in vitro 
assays e.g. coloured substrates or products to give visible indications of enzymatic activity, etc. 

30 In the processes of the invention, a reference standard is typically needed in order to detect whether a 
target polypeptide and a test compound interact, or to detect whether egression of a given target 
polypeptide has been inhibited, or to detect whether enzymatic activity is inhibited. One standard is a 
control experiment run in parallel to a process of the invention in the absence of the test compound. 
The results achieved in the control experiment and the process of the invention can then be compared 

35 in order to assess the effect of tiie test compound. As an alternative to determining the standard in 



' parallel, it may have been detennined before performing the process of the invention, or after the 
process has been performed. The standard may be an absolute standard derived from previous work. 

Some embodiments of the mvention comprise using competitive screening assays in which 
neutralising antibodies capable of binding a polypeptide of the invention specifically compete with a 
test compound for binding to the polypeptide. In this manner, the antibodies can be used to detect the 
presence of any peptide which shares one or more antigenic determinants with the S.pneumoniae 
polypeptide. Radiolabeled competitive bmding studies are described in.ref. 12. 

In other embodiments, the S.pmtanoniae polypeptides are employed as research tools for 
identification, characterisation and purification of interacting, regulatory proteins. Appropriate labels 
10 are incorporated into the polypeptides of tiie invention by various methods known in the art and the 
polypeptides are used to capture interacting molecules. For example, molecules are incubated witii 
the labelled polypeptides, washed to remove unbound polypeptides, and tiie polypeptide complex is 
quantified. Data obtained using different concentrations of polypeptide are used to calculate values 
for the number, afBnity, and association of polypeptide with the complex. 

15 Compounds identified by screening processes 

Test compounds which down-regulate expression of and/or which bind to a target polypeptide and/or 
which inhibit an enzymatic activity are usefiil as antibiotics, antibiotic candidates, or lead compounds 
for antibiotic development Once a test compound has been identified as a compound tiwt binds to a 
target polypeptide, or which inhibits its expression in a bacterium, it may be desirable to perform 

20 further experiments to confirm the in vivo function of tiie compound in inhibiting bacterial growtii. 
Any of tiie above processes may tiierefore comprise tiie furtiier steps of contacting tiie test compound 
witii a bacterium and assessing its effect on bacterial growtii and/or survival. Metiiods for 
determining bacterial growtii and survival are routinely available. 

The invention provides a compound obtained or obtamable by any of tiie processes described above. 

25 Preferably, the compounds are organic compounds. 

Once a compound has been identified using a process of tiie invention, it may be necessary to 
conduct further work on its pharmaceutical properties. For example, it may be necessary to alter tiie 
compound to improve its pharmacokinetic properties or bioavailability. The invention extends to any 
compounds identified by tiie metiiods of tiie invention which have been altered to improve tiieir 

30 pharmacokinetic properties and/or bioavailability, and to composition comprising tiiose compounds. 

The invention fiirther provides compounds obtained or obtainable using tiie processes of tiie 
invention, and compositions comprising tiiose compounds, for use as a medicament e.g. as an 
antibiotic. The invention also provides tiie use of compounds obtained or obtainable using tiie 
processes of tiie invention in tiie manufacture of an antibiotic, particularly an antibiotic for treating 
35 S.pnetmomae infection. 



The invention also provides a method for producing an antibiotic composition, comprisuig tiie steps 
of: (a) identifying a compound as described above; (b).manu&cturing tiie compound; (c) formulating^ 
the compoimd for administration to a patient; and (d) packaging the formulated compound to product 
tiie antibiotic composition. Details of pharmaceutical formulation can be found in ref. 13. 

Combinations of polypeptides 

The invention also provides a composition comprising m or more polypeptides, wherein each of the 
m or more polypeptides is: (a) a S.pneumoniae polypeptide encoded by of one of the genes listed in 
Table 1 or Table 2 or as specified in the middle coliimn of Table 1 or Table 2; (b) a polypeptide 
comprising (i) an amino acid sequence having sequence identity to the amino acid sequence encoded 
by of one of the genes listed in Tables 1 & 2 and/or (ii) an amino acid sequence comprising a 
fragment of the amino acid sequence encoded by of one of the genes listed in Tables 1 & 2; or (c) a 
homolog of a Table 1 polypeptide from another Streptococcus (such as S.pyogenes or S.agalactiae) 
or from another Gram-positive bacteriimi. 

The invention also provides a hybrid polypeptide comprising the amino acid sequences ofp or more 
polypeptides as defined in (a), (b) or (c) above. Thus a plurality of the 101 polypeptides of the 
invention are expressed as a single polypeptide chain. Linker peptide sequences may be included 
between dilBferent members of the 101 polypeptides of the invention. 

The values of m and of/i are, independentiy, at least 2 {e.g. 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12, 13, 14, 15, 
16, 17, 18, 19, 20 or more). 

The degree of sequence identity is preferably greater than 50% (e.g. 60%, 70%, 80%, 90%, 95%, 
99% or more), as mentioned above. A fragment on (b)(ii) should comprise at least n consecutive 
amino acids from the sequences, as mentioned above. 

Compositions and hybrid polypeptides of the invention are preferably immunogenic, and may be 
used for immunisation and vaccination purposes. Compositions may thus include an adjuvant. 
Suitable adjuvants include, but are not limited to: (A) aluminium salts, including hydroxides {e.g. 
oxyhydroxides), phosphates {e.g. hydroxyphoshpates, orthophosphates), sulphates, etc. [e.g. see 
chapters 8 & 9 of ref. 14]), or mixtures of different aluminium compounds, with the compounds 
taking any suitable form {e.g. gel, crystalline, amorphous, etc.\ and witii adsorption being preferred; 
(B) MFS9 (5% Squalene, 0.5% Tween 80, and 0.5% Span 85, formulated into submicron particles 
using a microfluidizer) [see Chapter 10 of 14; see also ref. 15]; (C) liposomes [see Chapters 13 and 
14 of ref. 14]; (D) ISCOMs [see Chapter 23 of ref 14], which may be devoid of additional detergent 
[16]; (E) SAF, containing 10% Squalane, 0.4% Tween 80, 5% pluronic-block polymer L121, and thr- 
MDP, either microfluidized into a submicron emulsion or vortexed to generate a larger particle size 
emulsion [see Chapter 12 of ref 14]; (F) Ribi™ adjuvant system (RAS), (Ribi Immunochem) 
containing 2% Squalene, 0.2% Tween 80, and one or more bacterial cell wall components from the 
group consisting of monophosphorylipid A (MPL), trehalose dunycolate (TDM), and cell wall 
skeleton (CWS), preferably MPL + CWS (Detox™); (G) saponin adjuvants, such as QuilA or QS21 
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^ [see Chapter 22 of ref. 14], also known as Stimulon™ [17]; (H) chitosan [e.g. 18]; (I) complete 
Freund's adjuvant (CFA) and incomplete Freund's adjuvant (IFA); (J) cytokines, such as interleuldns 
(e.g. IL-1, IL-2, IL-4, IL-5, IL-6, IL-7, IL-12, etc.\ interferons {e.g. interferon-nr), macrophage 
colony stimulating factor, tumor necrosis fector, etc. [see Chapters 27 & 28 of ref. 14]; (K) 
monophosphoryl lipid A (MPL) or S-O-deacylated MPL (3dMPL) [e.g. chapter 21 of ref. 14]; (L) 
combinations of 3dMPL with, for example, QS21 and/or oil-in-water emulsions [19]; (M) a 
polyoxyethylene e&er or a polyoxyethylene ester [20]; CN) a polyoxyethylene sorbitan ester 
surfectant in combination with an octoxynol [21] or a polyoxyefliylene alkyl ether or ester surfactant 
in combination with at least one additional non-ionic surfectant such as an octoxynol [22]; (N) a 
particle of metal salt [23]; (O) a saponin and an oU-in-water emulsion [24]; (P) a saponm (e.g. QS21) 
+ 3dMPL + IL-12 (optionaUy + a sterol) [25]; (Q) Rcoli heat-labile enterotoxin ("LT"), or detoxified 
mutants thereof, such as the K63 or R72 mutants [e.g. Chapter 5 of ref. 26]; (R) cholera toxin 
("CT"). or detoxified mutants thereof [e.g. Chapter 5 of ref. 26]; (S) double-stranded KNA; 
(Dmicroparticles (i.e. a particle of -lOOnm to ~150Mm in diameter, more preferably ~200nm to 
~30nm in diameter, and most preferably ~500nm to -lOpm in diameter) formed from materials that 
are biodegradable and non-toxic (e.g. a poly(a-hydroxy acid), a polyhydroxybutyric acid, a 
polyorthoester, a polyanhydride, a polycaprolactone, etc.), with poly(lactide-co-glycolide) being 
preferred, optionally treated to have a negatively-charged surfece (e.g. with SDS) or a positively- 
charged surface (e.g with a cationic detergent, such as CTAB); (U) oligonucleotides comprising 
20 CpG motife i.e. containing at least one CG dinudeotide, with 5-methylcytosme optionally being used 
in place of cytosine; (V) monophosphoryl lipid A mimics, such as aminoalkyl glucosaminide 
phosphate derivatives e.g. RC-529 [27]; (W) polyphosphazene (PCPP); (X) a bioadhesive [28] such 
as esterified hyaluronic acid microspheres [29] or a mucoadhesive selected from the group consisting 
of cross-linked derivatives of poly(acryUc acid), polyvmyl alcohol, polyvinyl pyrolUdone, 
25 polysaccharides and carboxymethylcellulose; or (Y) other substances that act as immunostimulating 
agents to enhance the effectiveness of the composition [e.g see Chapter 7 of lef. 14]. Aluminium 
salts are preferred adjuvants for parenteral immunisation. Mutant toxins are preferred mucosal 
adjuvants. 

Muramyl peptides include N-acetyl-muramyl-L-threonyl-D-isoglutamine (tbr-MDP), N-acetyl- 
30 normuramyl-L-alanyl-D-isoglutamme (nor-MDP), N-acetyhnuramyl-L-alanyl-D-isoglutaminyl-L-alamne- 
2-(l'-2'-dipalmitoyl-«n-glycero-3-hydro3Qrphosphoryloxy)-ethylamine MTP-PE), etc. 
The con^osition may also comprise other polypeptide or polysaccharide antigens e.g from 
S.pneumoniae. from other bacteria, from other pathogens, etc. delusion of saccharide antigens 
(preferably conjugated) fcom Neisseria is convenient. 

35 The composition may also include an antibiotic. 

A summary of standard techniques and procedures which may be employed to perform the invention 
follows. This summary is not a limitation on tiie invention but, ratiier, gives examples that may be 
used, but are not reqvured. 
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General 

The practice of tiie present invention will employ, unless otherwise indicated, conventional techniques o^^ 
molecular biology, microbiology, recombinant DNA, and immunology, which are within the skill of the art 
Such techniques are explained fully m the literature eg. Sambrook Molecular Cloning; A Laboratory Manual, 

5 Second Edition (1989); DNA Cloning, Volumes I and U (p.Yi Glover ed. 1985); Oligonucleotide Synthesis QAJ. 
Gait ed, 1984); Nucleic Acid Hybridization (B.D. Hames & SJ. Higgins eds. 1984); Transcription and 
Translation (B.D. Hames & SJ. Higgins eds. 1984); Animal Cell Culture (R.L Freshney ed. 1986); Immobilized 
Cells and Enzymes (IRL Press, 1986); B. Perbal, A Practical Guide to Molecular Cloning (1984); the Methods 
in Enzymology series (Academic Press, Inc.), especially volumes 154 & 155; Gene Transfer Vectors for 

10 Mammalian Cells (J.H. Miller and M.P. Calos eds. 1987, Cold Spring Harbor Laboratory); Mayer and Walker, 
eds. (1987), Immunochemical Methods in Cell and Molecular Biology (Academic Press, London); Scopes, 
(1987) Protein Purification: Principles and Practice^ Second Edition (Springer-Verlag, N.Y.), and Hmdbook of 
Experimental Immunology, Volumes I-IV(PM. Weir and C. C. Blackwell eds 1986). 

Standard abbreviations for nucleotides and amino acids are used in tiiis specification. 

15 Definitions 

A composition containing X is "substantially free of Y when at least 85% by weight of the total X+Y m tiie 
composition is X. Preferably, X comprises at least about 90% by weight of the total of X+Y in the composition, 
more preferably at least about 95% or even 99% by weight 

The term "comprising" means "including' as well as "consisting" e,g. a composition "comprising" X may 
20 consist exclusively of X or may include something additional X + Y. 

The term "about" in relation to a numerical value x means, for example, x+10%. 

The word "substantially" does not exclude "completely" e.g. a composition which is "substantially free" from Y 
may be completely free from Y. Where necessary, the word "substantially" may be omitted from the definition 
of the invention. 

25 The term "heterologous" refers to two biological components that are not found together in nature. The 
components may be host cells, genes, or regulatory regions, such as promoters. Although tiie heterologous 
components are not found together in nature, tiiey can function togetiier, as when a promoter heterologous to a 
gene is operably linked to the gene. Another example is where a streptococcus sequence is heterologous to a 
mouse host cell. A further examples would be two epitopes from the same or different proteins which have been 

30 assembled in a single protein in an arrangement not found in nature. 

An "origm of replication" is a polynucleotide sequence that initiates and regulates replication of polynucleotides, 
such as an e?q)ression vector. The origin of replication behaves as an autonomous unit of polynucleotide 
replication within a cell, capable of replication under its own control. An origin of replication may be needed for 
a vector to replicate in a particular host cell. Wilh certain origins of replication, an e?q)ression vector can be 
35 reproduced at a high copy number in tiie presence of the appropriate proteins within tiie cell. Examples of 
origins are the autonomously replicating sequences, which are effective in yeast; and the vbal T-antigen, 
effective m COS-7 cells. 

A "mutanf ' sequence is defined as DNA, RNA or amino acid sequence differing from but having sequence 
identity with tiie native or disclosed sequence. Depending on the particular sequence, the degree of sequence 
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identity between the native or disclosed sequence and tiie mutant sequence is preferably greater flian 50% {fig. 
60%, 70%, 80%, 90%, 95%, 99% or more, calculated using the Smilii-Waterman algprilhm as described above). 
As used herein, an -aUeUc varianf of a nucleic acid molecule, or region, for v*ich nucleic acid sequence is 
provided herein is a nucleic acid molecule, or region, that occurs essentially at the same locus in the genome of 
another or second isolate, and that, due to natural variation caused by, for example, mutation or recombination, 
has a sunilar but not identical nucleic acid sequence. A coding region alleUc variant typically encodes a protein 
having similar activity to that of the protein encoded by the gene to which it is being compared. An allelic 
variant can also comprise an alteration in the 5' or 3' untranslated regions of the gene, such as in regulatory 
control regions {fig. see US patent 5,753,235). 
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The streptococcus nucleotide sequences can be ejqjressed in a variety of different expression systems; for 
example Aose used wifli mammalian cells, baculovhruses, plants, bacteria, and yeasL 
i. Mammalian Systems 

Mammalian «q)ression systems are known in the art A mammalian promoter is any DNA sequence capable of 
15 bmdmg mammalian RNA polymerase and initiating the downstream (3') transcription of a coding sequence {eg. 
structural gene) into mRNA. A promoter wiU have a transcription initiating region, which is usually placed 
proximal to the 5' end of the coding sequence, and a TATA box, usually located 25-30 base pairs (bp) upstream 
of the transcription initiation site. The TATA box is thought to direct RNA polymerase to begin RNA 
synthesis at the correct site. A mammaUan promoter vrill also contain an upstream promoter element, usually 
20 located withm 100 to 200 bp upstream of flie TATA box. An upstream promoter element determmes the rate at 
which transcription is initiated and can act in eiflier orientation [Sambrook et al. (1989) "E3q)ression of Cloned 
Genes in Mammalian Cells." In Molecular Cloning: A Laboratory McmutO. 2nd edj. 

Mammalian viral genes are often highly ejqpressed and have a broad host range; therefore sequences encoding 
mammalian viral genes provide particularly use&l promoter sequences. Examples include the SV40 early 
25 promoter, mouse mammary tumor virus LTR promoter, adenovirus major late promoter (Ad MLP), and herpes 
simplex virus promoter. In addition, sequences derived from non-viral genes, such as the murine 
metallotheionein gene, also provide usefixl promoter sequences. Expression may be either constitutive or 
regulated (inducible), depending on tiie promoter can be induced with glucocordcoid in hormone-responsive 
cells. 

30 The presence of an enhancer element (enhancer), combined with the promoter elements described above, wiU 
usuaUy increase expression levels. An enhancer is a regulatory DNA sequence that can stnnulate transcription up 
to 1000-fold when linked to homologous or heterologous promoters, wife synthesis begmnmg at Ae normal 
RNA start site. Enhancers are also active when they are placed upstream or downstream from the transcription 
taitiation site, in eilher normal or flipped orientation, or at a distance of more than 1000 nucleotides from the 

35 promoter [Maniatis et al. (1987) Science 236:\73T, Alberts et al. (1989) Molecular Biology of the Cell, 2nd ed.]. 
Enhancer elements derived from viruses may be particularly useful, because they usuaUy have a broader host 
range. Examples include the SV40 early gene enhancer [Dijkema et al (1985) EMBO J. 4161] and the 
enhancer/promoters derived from the long terminal repeat (LTR) of the Rous Sarcoma Virus [Gorman et al. 
(1982b) Proc. Natl. Acad. Sci. 79:6111] and from human cytomegalovirus [Boshart et al. (1985) Cell 47:521]. 

40 Additionally, some enhancers are regulatable and become active only in tiie presence of an inducer, such as a 
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hoimone or metal ion [Sassone-Coisi and Borelli (1986) Trends Genet. 2:215; Maniatis et aL (1987) Science 
236:1237]. 

A DNA molecule may be e^iessed intmcellularly in manmialian cells. A promoter sequence may be diiectly^^ 
linked with the DNA molecule, in which case the first ammo acid at the N-temunus of the recombmant protem 
5 will always be a methionme, which is encoded by the ATG start codon. If desired, the N-termmus may be 
cleaved from the protein by in vitro incubation with cyanogen bromide. 

Alternatively, foreign proteins- can also be secreted from the cell into the growth media by creating chuneric 
DNA molecules that encode a fusion protein comprised of a leader sequence fragment that provides for secretion 
of the foreign protein in mammalian cells. Preferably, there are processing sites encoded between the leader 
10 fragment and the foreign gene that can be cleaved either in vivo or in vitro. The leader sequence fragment 
usually encodes a signal peptide comprised of hydrophobic amino acids which direct the secretion of the protein 
from the cell. Hie adenovirus triparite leader is an example of a leader sequence that provides for secretion of a 
foreign protein in mammalian cells. 

Usually, transcription termination and polyadenylation sequences recognized by mammalian cells are regulatory 
IS regions located 3* to the translation stop codon and thus, together with the promoter elements, flank the coding 
sequence. The 3' temunus of Ae mature mRNA is formed by site-specific post-transcriptional cleavage and 
polyadmylation [Bunstiel et al. (1985) Cell 41:349; Proudfoot and Whitelaw (1988) "Termination and 3* end 
processing of eukaryotic RNA. In Transcription and splicing (ed. BD, Hames and DM. Glover); Proudfoot 
(1989) Trends Biochem. Sci 7^:105]. These sequences direct the transcription of an mRNA which can be 
20 translated into the polypeptide encoded by the DNA. Examples of transcription terminater/polyadenylation 
signals include those derived from SV40 [Sambrook et al (1989) "Expression of cloned genes m cultured 
mammalian cells." In Molecular Cloning: A Laboratory Manual]. 

Usually, the above described components, comprising a promoter, polyadenylation signal, and transcription 
termination sequence are put together into expression constructs. Enhancers, introns with fimctional splice donor 

25 and acceptor sites, and leader sequences may also be included in an e}q)ression construct, if desired. Expr^sion 
constructs are often maintained in a replicon, such as an extrachromosomal element (eg. plasmids) capable of 
stable maintenance in a host, such as mammalian cells or bacteria. Mammalian replication systems include those 
derived from animal vunses, vMch require trans-acting fectors to replicate. For example, plasmids contaimng 
the replication systems of papovavhwses, such as SV40 [Gluzman (1981) Cell 25:175] or polyomavirus, 

30 replicate to extremely high copy number m the presence of the appropriate viral T antigen. Additional examples 
of mammalian replicons include those derived from bovine papillomavirus and Epstein-Barr vurus. Additionally, 
the replicon may have two replicaton systems, thus allowing it to be maintamed, for example, in mammalian 
cells for e3q}ression and in a prokaryotic host for cloning and amplification. Examples of such mammalian- 
bacteria shuttle vectors include pMT2 [Kaufinan et al. (1989) MoL Cell. Biol. 9:946\ and pHEBO [Shimizu et al. 

35 (1986) Mol. Cell. Biol. 5:1074]. 

The transformation procedure used depends upon the host to be transformed. Methods for mtroduction of 
heterologous polynucleotides into manmialian cells are known in the art and include dextran-mediated 
transfection, calcium phosphate precipitation, polybrene mediated transfection, protoplast ftision, 
electroporation, encapsulation of the polynucleotide(s) in liposomes, and direct microinjection of the DNA mto 
40 nuclei. 
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MammaUan ceU Imes available as hosts for expression are knovm in Ihe art and include many immortalized cell 
lines available fiom flie American Type Culture CoUection (ATCC), including but not limited to, Chinese 
hamster ovary (CHO) cells, HeLa cells, baby hamster kidney (BHK) cells, monkey kidney cells (COS), human 
hepatocellular carcinoma cells (eg. Hep G2), and a number of otiier cell lines. 
5 ii. Baculovinis Svstwns 

The polynucleotide enco^g the protein can also be inserted mto a suitable insect eiqpresdon vector, and is 
operably linked to the control elements withm that vector. Vector constniction employs techniques which are 
known in the art Generally, Ae conqionents of the expression system include a transfer vector, usually a 
bacterial plasmid, which contains both a ftagment of Ihe baculovinis genome, and a convenient restriction site 
10 for msertion of the heterologous gene or genes to be expressed; a wUd type baculovinis with a sequence 
homologous to the baculovinis-specific fluent in the transfer vector (tiiis allows for the homologous 
recombination of flie heterologous gene in to the baculovinis genome); and appropriate insect host cells and 
growdi media. 

After inserting the DNA sequence encoding the protein into the transfer vector, the vector and the wild type viral 
15 genome are transfected mto an insect host cell where the vector and viral genome are allowed to recomlnne. The 
packaged recombinant vkus is expressed and recombmant plaques are identified and purified. Materials and 
methods for baculovirusAnsect cell «q)ression systems are commercially available in kit form fiwm, inter alia, 
hivitrogen, San Diego CA ("MaxBac" kit). These techniques are generally known to those skiUed in 4e art and 
fiilly described m Summers and Smith, Texas Agricultural Experment Station Bulletin No. 1555 (1987) 
20 O^^eremafier "Summers and Smilfa"). 

Prior to insertmg Ihe DNA sequence encoding fte protem mto the baculovinis genome, the above described 
components, comprising a promoter, leader (if desirecQ, codmg sequence, and transcription tennination 
sequence, are usually assembled into an intennediate transplacement construct (transfer vector). This may 
contain a single gene and operably linked regulatory elements; multiple genes, each with its owned set of 
25 operably Imked regulatory elements; or multiple genes, regulated by the same set of regulatory elements, 
htermediate transplacement constnicts are often mamtained in a replicon, such as an extra-chrcHnosomal 
element {e.g. plasmids) capable of stable maintenance m a host, such as a bacterium. The replicon will have a 
replication system, thus allowing it to be mwntained in a suitable host for clonmg and amplification. 
Currently, tiie most commonly used transfer vector for introducmg foreign genes into AcNPV is pAc373. Many 
30 other vectors, known to those of skill m 4e art, have also been designed. These include, for example, pVL985 
(which alters fte polyhedrin start codon from ATG to ATT, and which mtroduces a BamHI clonmg site 32 
basepairs downstream fiom the ATT; see Luckow and Summers, Virology (1989) i 7:31. 
The plasmid usually also contdns the polyhedrin polyadenylation signal (Miller et al. (1988) JmL itev. 
Microbiol, 42:111) and a prokaryotic ampicillm-resistance (amp) gfins and origm of replication for selection 
35 and propagation m K colt. 

Baculovinis transfer vectors usually contain a baculovhrus promoter. A baculovinis promoter is any DNA 
sequence capable of bindmg a baculovuus RNA polymerase and mitiating the downstream (5' to 3") transcription 
of a codmg sequence (eg. structural gene) into mKNA. A promoter will have a transcription mitiation region 
which is usually placed proximal to the 5' end of tiie coding sequence. This transcription initiation region usually 
40 includes an RNA polymerase bmdmg site and a transcription initiation site. A baculovinis toansfer vector may 
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also have a second domain called an enhancer, which, if present, is usually distal to the structural gene. 
Expression may be either regulated or constitutive. 

Structural genes, abundantly transcribed at late times in a viral infection cycle, provide particularly usefiil'^^^ 

promoter sequences. Examples include sequences derived from the gene encoding the viral polyhedron protein, 
5 Friesen et al., (1986) "The Regulation of Baculovirus Gene Expression," in: The Molecular Biology of 
Baculoviruses (ed. Walter Doerfler); EPO Publ. Nos. 127 839 and 155 476; and the gene encoding the plO 
protein, Vlak et al., (1988), J. Gen. Virol. 69:765, 

DNA encoding suitable signal sequences can be derived from genes for secreted insect or baculovirus proteins, 
such as the baculovirus polyhedrin gene (Carbonell et al. (1988) Gene, 73:409). Alternatively, smce flie signals 

10 for mammalian cell posttranslational modifications (such as signal peptide cleavage, proteolytic cleavage, and 
phosphorylation) appear to be recognized by insect cells, and the signals required for secretion and nuclear 
accumulation also ^pear to be conserved between the mvertebrate cells and vertebrate cells, leaders of non- 
mseet origm, such as those derived from genes encodmg human ^interferon, Maeda et al., (1985), Nature 
315:592; human gastrin-releasing peptide, Lebacq-Verheyden et al., (1988), Molec. Cell. Biol ^:3129; human 

15 IL-2, Smith et al., (1985) Proc. Nat'l Acad. Sci. USA, «2:8404; mouse IL-3, (Miyajuna et al., (1987) Gene 
58:273; and human glucocerebrosidase, Martin et al. (1988) DNA, 7:99, can also be used to provide for secretion 
in insects. 

A recombinant polypeptide or polyprotein may be expressed mtracellularly or, if it is expressed with the proper 
regulatory sequences, it can be secreted. Good intracellular expression of nonfrised foreign proteins usually 
20 requires heterologous genes that ideally have a short leader sequence containing suitable translation initiation 
signals preceding an ATG start signal. If desired, methionine at the N-terminus may be cleaved from the mature 
protein by in vitro incubation with cyanogen bromide. 

Alternatively, recombinant polyproteins or proteins which are not naturally secreted can be secreted from the 
insect cell by creating chimeric DNA molecules that encode a fusion protein comprised of a leader sequence 
25 fragment that provides for secretion of the foreign protem in insects. The leader sequence fragment usually 
encodes a signal peptide comprised of hydrophobic amino acids which direct the translocation of the protem mto 
the endoplasnuc reticulum. 

Afier insertion of tiie DNA sequence and/or the gene encodmg the expression product precursor of the protein, 
an insect cell host is co-transformed with the heterologous DNA of the transfer vector and tiie genonuc DNA of 

30 wild type baculovirus - usually by co-transfection. The promoter and transcription termination sequence of the 
construct will usually comprise a 2-5kb section of the baculovirus genome. Metiiods for introducing 
heterologous DNA into the desired site in the baculovirus virus are known in the art (See Summers and Smith 
supra; Ju et al. (1987); Smitii et al., MoL Cell. Biol (1983) i:2156; and Luckow and Summers (1989)). For 
example, the insertion can be into a gene such as the polyhedrin gene, by homologous double crossover 

35 recombination; insertion can also be into a restriction enzyme site engineered into the desired baculovirus gene. 
Miller et al., (1989), Bioessays ^:91.The DNA sequence, when cloned in place of the polyhedrin gene in the 
expression vector, is flanked both 5' and 3* by polyhedrin-specific sequences and is positioned downstream of 
the polyhedrin promoter. 

The newly formed baculovirus expression vector is subsequently packaged into an infectious recombinant 
40 baciilovirus. Homologous lecombmation occurs at low frequency (between about 1% and about 5%); thus, tiie 
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majority of the virus produced after cotransfection is still wUd-type virus. Therefore, a mefliod is necessary to 
identify recombinant viruses. An advantage of the expression system is a visual screen aMovdng recombinant 
viruses to be distmguished. The polyhedrin protein, which is produced by flie native virus, is produced at very 
high levels in the nuclei of mfected cells at late times after viral infection. Accumulated polyhedrin protein 
5 forms occlusion bodies Aat also contain embedded particles. These occlusion bodies, up to 15 jm in si2«, are 
hi^y refoictiHe, ^ving Aem a bright shiny appearance that is readily visualized under tiie light microscope. 
Cells infected \nth recombmant viruses lack occlusion bodies. To distinguish recombinant virus from wild-^e 
virus, tiie transfection supernatant is plaqued onto a monolayer of insect cells by techniques known to those 
skiUed in the art. Namely, the plaques are screened under the light microscope for the presence (indicative of 
10 wild-type vinis) or absence (indicative of recombinant virus) of occlusion bodies. "Current Protocols in 
Microbiology" Vol. 2 (Ausubel et al. eds) at 16.8 (Supp. 10, 1990); Summers and Smith. ,si(prfl; Miller et aL 
(1989). 

Recombinant baculovirus expression vectors have been developed for infection into several insect cells. For 
example, recombinant baculoviruses have been developed for, inter alia: Aedes aegypH , Autographa 
15 califomica, Bombyx mori, Drosophila melanogaster, Spodoptera frugiperda, and Trichoplusia ni (WO 
89/046699; CarboneU et al., (1985) J. ViroL 55:153; Wright (1986) Nature 321:119; Smith et al, (1983) MoL 
Cell BioL J:2156; and see generally, Fraser, et aL (1989) In Vitro Cell Dev. Biol 25i22S). 
Cells and cell culture media are commercially available for both direct and fiision expression of heterologous 
polypeptides in a baculovirus/expression system; cell culture technology is generally known to those skiUed in 
20 the art See, eg. Summos and Snutfa stgtra. 

Tbs modified msect cells may tiien be grown in an appropriate nutrient medium, which allows for stable 
maintenance of the plasmid(s) present in the modified insect host. Where tiie expression product gene is under 
inducible control, the host may be grown to high density, and expression induced. Alternatively, where 
expression is constitutive, tiie product will be continuously ejqpressed into the medium and tiie nutiient medium 
25 must be continuously circulated, while removing tiie product of interest and augmenting depleted nutiients. The 
product may be purified by such techniques as chromatography, eg. HPLC, affinity chromatography, ion 
exchange chromatography, etc.; electrophoresis; density gradient centrifogatioi^ solvent extraction, etc. As 
appropriate, tiie product may be forther purified, as requned, so as to remove substantially any msect protems 
vAich are also present m tiie medium, so as to provide a product which is at least substantially fi«e of host 
30 ddjris, eg. proteins, lipids and polysacdiarides. 

In Older to obtain protein expression, recombuiant host cells derived &om tiie transfonnants are incubated under 
conditions which allow expression of tiie recombmant protein encoding sequence. These conditions will vary, 
dependent upon tiie host ceU selected. However, tiie conditions are readily ascertainable to tiiose of ordmary skill 
in the art, based iqion what is known in the art. 
35 iii- Plant Systems 

There are many plant cell culture and whole plmt genetic expression systans known in tiie art Exemplary plant 
cellular genetic expression systems include those described in patents, such as: US 5,693,506; US 5,659,122; 
and US 5,608,143. Additional examples of genetic expression in plant cell culture has been described by Zenk, 
Phytochemistry 30:3861-3863 (1991). Descriptions of plant protein signal peptides may be found in addition to 
40 tiie references described above in Vaulcombe et al., Mol. Gen. Genet. 209:33-40 (1987); Chandler et aL, Plant 
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Molecular Biology 3:407-418 (1984); Rogers, J. Biol. Chem. 260:3731-3738 (1985); Rothstein et al.. Gene 
55:353-356 (1987); Whittier et al.. Nucleic Acids Research 15:2515-2535 (1987); Wiisel et al.. Molecular^ 
Mcrobiology 3:3-14 (1989); Yu et al.. Gene 122:247-253 (1992), A description of the regulation of plant gene* 
expression by the phytohormone, gibberellic acid and secreted enzymes induced by gibberellic acid can be found 
5 in R.L. Jones and J. MacMillin, Gibberellins: in: Advanced Plant Physiology,. Malcohn B. Wilkins, ed., 1984 
Pitman Publishing Limited, London, pp. 21-52. References that describe other metabolically-regulated genes: 
Sheen, Plant Cell, 2:1027-1038(1990); Maas et al., EMBO J. 9:3447-3452 (1990); Benkel and Hickey, Froc. 
Natl. Acad Sci. 84:1337-1339 (1987). 

Typically, using techniques known in the art, a desired polynucleotide sequence is inserted into an expression 
10 cassette comprising genetic regulatory elements designed for operation in plants. The expression cassette is 
inserted into a desired expression vector with companion sequences upstream and downstream from the 
expression cassette suitable for expression in a plant host The companion sequences will be of plasmid or viral 
origiii and provide necessary characteristics to tfie vector to permit the vectors to move DNA from an original 
cloning host, such as bacteria, to the desured plant host The basic bacterial/plant vector construct will preferably 
15 provide a broad host range prokaryote replication origin; a prokaryote selectable marker; and, for Agrobacterium 
transformations, T DNA sequences for Agrobacterium-mediated transfer to plant chromosomes. Where the 
heterologous gene is not readily amenable to detection, the construct will preferably also have a selectable 
marker gene suitable for determining if a plant cell has been transformed. A general review of suitable markers, 
for example for the members of the grass family, is found in Wihnink and Dons, 1993, Plant Mol. Biol. Reptr^ 
20 11(2):165-185. 

Sequences suitable for permitting integration of the heterologous sequence into the plant genome are also 
recommended. These might mclude transposon sequences and the like for homologous recombmation as well as 
Ti sequences which permit random insertion of a heterologous expression cassette into a plant genome. Suitable 
prokaryote selectable markers include resistance toward antibiotics such as ampicillin or tetracycline. Other 
25 DNA sequences encodmg additional functions may also be present in the vector, as is known in the art. 

The nucleic add molecules of tiie subject invention may be included into an e:q>ression cassette for egression 
of tiie protem(s) of inta:est Usually, there will be only one expression cassette, although two or more are 
feasible. The recombinant expression cassette will contain in addition to the heterologous protein encoding 
sequence the following elements, a promoter region, plant 5' untranslated sequences, initiation codon depending 
30 upon whether or not the structural gene comes equipped with one, and a transcription and translation termination 
sequence. Unique restriction enzyme sites at the S* and 3* ends of the cassette allow for easy insertion into a pre- 
existing vector. 

A heterologous coding sequence may be for any protein relating to the present invention. The sequence encoding 
tiie protem of interest will encode a signal peptide which allows processing and translocation of the protein, as 

35 ^propriate, and will usually lack any sequence which might result in the binding of the desired protein of the 
mvention to a membrane. Since, for the most part, the transcriptional initiation region will be for a gene ^duch is 
esqpressed and translocated during germination, by employing tiie signal peptide which provides for 
translocation, one may also provide for translocation of the protein of interest In this way, the protein(s) of 
interest will be translocated fix>m tiie cells m which they are expressed and may be ef&cientiy harvested. 

40 Typically secretion in seeds are across the aleurone or scutellar epitiielium layer into the endosperm of tiie seed. 
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WhUe it is not required that the protein be secreted fiom the cells in ^ch the protein is produced, this 
facilitates the isolaticm and purification of the recombinant protein. 

Since the ultunate expression of the desired gene product wiU be m a eucaiyotic ceU it is dearable to detennine 
whether any portion of the cloned gene contains sequences v*ich will be processed out as introns by tiie host's 
5 splicosome machinery. If so, she-directed mutagenesis of the "intron" region may be conducted to prevent losing 
a portion of tiie genetic message as a fidse intron code. Reed and Maniatis, Cell 41 :95-105, 1985. 
The vector can be nucroinjected directiy into plant cells by use of micropipettes to mechanically transfer the 
recombinant DNA. Crossway, MoL Gen. Genet, 202:179-185, 1985. The genetic material may also be 
transferred into the plant ceU by using polyelhylene glycol, Krens, et aL, Nature, 296, 72-74, 1982. Another 
10 method of introduction of nucleic acid segments is high velocity baUistic penetration by small particles witii tiie 
mrcleic acid either wifliin tiie matrix of small beads or particles, or on tiie surface, Klein, et al.. Nature, 327, 70- 
73, 1987 and Knudsen and Muller, 1991, Planta, 185:330-336 teaching particle bombardment of barley 
endosperm to create transgenic barley. Yet another metiiod of introduction would be fusion of protoplasts with 
other entities, eitiier minicells, cells, lysosomes or otiier fosible lipid-surfaced bodies, Fraley, et al., Proc. Natl 
15 Acad. Sou USA, 79, 1 859-1863, 1982. 

The vector may also be introduced into tiie plant cells by electiroporation. (Fronun et al., Proc. Natl Acad. Sci. 
USA 82:5824, 1985). In tiiis technique, plant protoplasts are electroporated in tiis presence of plasnuds 
containing tiie gene constract. Electrical impulses of higji field strengtti reversibly peimeabilize biomembranes 
allowing tiie introduction of tiie plasmids. Electroporated plant protoplasts reform tiie cell wall, divide, and form 
20 plant callus. 

All plants fam which protoplasts can be isolated and cultured to give whole regenerated plants can be 
tiansfonned by tiie present invention so tiiat ydiole plants are recovered which contain tiie transferred gene. It is 
known tiiat practically all plants can be regenerated fiom cultured cells or tissues, including but not limited to all 
majw species of sugarcane, sugar beet, cotton, fiuit and other trees, legumes and vegetables. Some suitable 

25 plants include, for example, species from tiie genera Fragaria, Lotus, Medicago, Onohrychis, Trifolium, 
Trigonella, Vigna, Citrus, Linum, Geranium, Manihot, Daucus, Arahidopsis, Brassica, Raphanus, Sinapis, 
Atropa, Ccq>sicum, Datura, Hyoscyamus, Lycopersion, Nicotiana, Solanum, Petunia, Digitalis, Mcgorana, 
Cichorium, Heliantkus, Lactuca, Bromus, Asparagus, Antirrhinum, Hererocallis, Nemesia, Pelargonium, 
Panicum, Penniseium, Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, Glycine, Lolium, Zea, liiticum, 

30 Sor^mm,m3L Datura. 

Means for regeneration vary fix>m species to qiecies of plants, but generally a suspKisirai of transformed 
protoplasts containing copies of tiie heterologous gene is first provided. Callus tissue is formed and shoots may 
be induced fiom callus and subsequenfly rooted. Alternatively, «nbryo formation can be induced fix)m flie 
protoplast suspension. These ^bryos germinate as natural anbryos to form plants. The culture media will 

35 genaally contain various amino acids and hormones, such as auxin and cytokinins. It is also advantageous to 
add glutamic acid and proline to tiie medium, especially for such species as com and alfalfa. Shoots and roots 
normally develop simultaneously. EflScient regeneration will depend on the medium, on tiie genoQrpe, and on 
tiie history of tiie culture. If tiiese tinee variables are controlled, tiien regeneration is fiiUy reproducible and 
repeatable. 
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In some plant cell culture systems, tiie desiied protein of the invention may be excreted or alternatively, the 
protem may be extracted from the whole plant Where the deshred protem of flie invention is secreted into the 
medium, it may be collected. Alternatively, the embryos and embryoless-half seeds or other plant tissue may 
mechanically disrupted to release any secreted protein between cells and tissues. The mixture may be suspended 
5 in a buffer solution to retrieve soluble proteins. Conventional protein isolation and purification methods will be 
then used to purify the recombinant protem. Parameters of time, temperature pH, oxygen, and volumes will be 
adjusted tiux)ugh routine methods to optimize expression and recovery of heterologous protein. 
IV. Bacterial Svstems 

Bacterial e}q)ression techniques are known m the art. A bacterial promoter is any DNA sequence capable of 
10 binding bacterial RNA polymerase and initiating the downstream (3') transcription of a coding sequence {eg. 
structural gene) into mRNA. A promoter will have a transcription initiation re^on which is usually placed 
proximal to the S' end of the coding sequence. This transcription initiation region usually includes an KNA 
polymerase binding site and a transcription mitiation site. A bacterial promoter may also have a second domain 
called an operator, that may overlap an adjacent RNA polymerase bmding site at which RNA synthesis begms. 
15 The operator permits negative regulated (inducible) transcription, as a gene repressor protein may bind the 
operator and thereby inhibit transcription of a specific gene. Constitutive expression may occur in the absence of 
negative regulatory elements, such as the operator. In addition, positive regulation may be achieved by a gene 
activator protem bmdmg sequence, which, if present is usually proximal (50 to the RNA polymerase binding 
sequence: An example of a gene activator protein is the catabolite activator protem (CAP), which helps initiate 
20 transcription of tiie lac operon in Escherichia coli (K coli) [Raibaud et al (1984) Armtu Rev, Genet 75:173]. 
Regulated expression may therefore be either positive or negative, thereby either enhancing or reducing 
transcription. 

Sequences encoding metabolic pathway enzymes provide particularly useful promoter sequences. Examples 
mclude promoter sequences derived fit>m sugar metaboli^dng enzymes, such as galactose, lactose Qac) [Chang et 

25 al. (1977) Nature iP5:1056], and maltose. Additional examples include promoter sequences derived fix)m 
biosynthetic enzymes such as tryptophan {trp) [Goeddel et al. (1980) Nuc, Acids Res. 5:4057; Yelverton et al 
(1981) NucL Acids Res. P:731; US patent 4,738,921; EP-A-0036776 and EP-A-0121775]. The g-laotamase (bla) 
promoter system [Weissmann (1981) "The cloning of interferon and otiier mistakes," In Interferon 3 (ed. L 
Gresser)], bacteriophage lambda PL [Shimatake et al. (1981) Nature 2P2:128] and T5 [US patent 4,689,406] 

30 promoter systems also provide useful promoter sequences. 

In addition, synthetic promoters which do not occur in nature also function as bacterial promoters. For example, 
transcription activation sequences of one bacterial or bacteriophage promoter may be joined with the operon 
sequences of another bacterial or bacteriophage promoter, creating a synthetic hybrid promoter [US 
patent 4,551,433]. For example, the tac promoter is a hybrid trp-lac promoter comprised of both trp promoter 

35 and lac operon sequences that is regulated by the lac repressor [Amann et al (1983) Gene 25:167; de Boer et cd, 
(1983) Proc. Natl Acad, Scl 50:21]. Furthermore, a bacterial promoter can include naturally occurring 
promoters of non-bacterial origin that have the ability to bmd bacterial RNA polymerase and initiate 
transcription. A naturally occurring promoter of non-bacterial origin can also be coupled with a compatible RNA 
polymerase to produce high levels of expression of some genes in prokaryotes. The bacteriophage T7 RNA 

40 polymerase/promoter system is an example of a coupled promoter system [Studier et al. (1986) J. Mol Biol 
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189:113; Tabor et al (1985) Proc Natl. Acad. Set «2:1074]. In addition, a hybrid promoter can also be 
comprised of a bacteriophage promotrar and an £ coli operator re^on (H>O-A-0 267 851). 
In addition to a functioning promoter sequence, an efficient ribosome binding site is also usefiU for the 
expression of foreign genes in prokaryotes. In E. coli, the ribosome binding site is called the Shine-Dalgamo 
(SD) sequence and includes an initiation codon (ATG) and a sequence 3-9 nucleotides m length located 3-11 
nucleotides upstream of the initiation codon [Shine et al. (1975) Nature 25^:34]. The SD sequence is thougjit to 
promote binding of mRNA to the ribosome by the pairing of bases between the SD sequence and the 3' and of £. 
coli 16S «NA [Steitz et al. (1979) "Genetic signals and nucleotide sequences in messenger RNA." In Biological 
Regulation and Development: Gene Expression (ed. R.F. Goldberger)]. To express eukaryotic genes and 
prokaryotic genes with weak ribosome-binding site [Sambrook et al. (1989) "Expression of cloned genes in 
Escherichia coli." la Molecular Cloning: A Laboratory Manual\. 

A DNA molecule may be expressed intracellularly. A promoter sequence may be directly linked with the DNA 
molecule, in which case the first amino acid at the N-terminus will always be a methionine, which is encoded by 
the ATG start codon. If desired, methionine at the N-temiinus may be cleaved fiom the protein by in vitro 
15 mcubation with cyanogen bromide or by either in vivo on in vitro incubation witii a bacterial mefluonme N- 
terminal peptidase (EPO-A-0 219 237). 

Fusion proteins provide an alternative to direct expression. Usually, a DNA sequence encodmg tiie N-terminal 
portion of an endogenous bacterial protem, or other stable protem, is fused to the 5' end of heterologous codmg 
sequences. Upon expression, tiiis construct will provide a fiision of tiie two ammo acid sequences. For example, 
20 the bacteriophage lambda cell gene can be linked at die 5' terminus of a foreign gene and expressed m bacteria. 
The resulting fiision protem preferably retams a site for a processing enzyme (factor Xa) to cleave the 
bacteriophage protein fiom the foreign gene [Nagai et al. (1984) Nature 309:810]. Fusion proteins can also be 
made with sequences fiom tiie lacZ [Jia et al. (1987) Gene SO-.m, trpE [Allen et al. (1987) J. Biotechnol. 5:93; 
Makoff al. (1989) J. Gen. Microbiol. 135.11], and Chey [EP-A-0 324 647] genes. The DNA sequence at tiie 
25 junction of tiie two amino acid sequences may or may not encode a cleavable site. Anotiier example is a 
ubiquitin fiision protein. Such a fiision protein is made witii tiie ubiquitin region tiiat preferably retains a site for 
a processing enzyme (eg. ubiquitin specific processing-protease) to cleave tiie ubiquitin fiom tiie foreign protem. 
Through tills mefliod, native foreign protem can be isolated [NMer et al. (1989) BioTTechnology 7:698]. 
Alternatively, foreign protems can also be secreted fi»m the ceU by creating chimeric DNA molecules tiiat 
30 encode a fiision protein comprised of a signal peptide sequence firagment tiiat provides for secretion of tiie 
foreign protein m bacteria [US patent 4^36,336]. The signal sequence firagment usuaUy encodes a signal peptide 
comprised of hydrophobic amino acids which direct tiie secretion of tiie protein fiom tiie cell. The protein is 
eiflier secreted into tiie growfli media (gram-positive bacteria) or into tiie periplasmic space, located between tiie 
mner and outer membrane of tiie cell (gram-negative bacteria). Preferably tiiere are processing sites, which can 
35 be cleaved eitiier in vivo or in vitro encoded between tiie signal peptide fiagment and tiie foreign gene. 

DNA encoding suitable signal seqiiences can be derived fi»m genes for secreted bacterial proteins, such as tiie 
E. coli outer membrane protem gene (pmpA) [Masui et al. (1983), in: Experimental Manipulation of Gene 
Expression-, Ghrayeb et al. (1984) EMBO J. 3:2437] and tiie E. coli alkaline phosphatase signal sequence {phoA) 
[Oka et al. (1985) Proc. Natl Acad. SO. 82:7212]. As an additional example, tiie signal sequence of tfie alpha- 
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amylase gene fixma various Bacillus stains can be used to secrete heterologous proteins ftom B. stibttlis [Palva 
et al. (1982) Proc NatL Acad. ScL USA 79:5582; EP-A-0 244 0421. 

i 

Usually, transcription tenninaticm sequences recognized by bacteria are regulatory r^ons located 3' to flie 
translation stop codon, and thus together with the promoter flank Ac coding sequence. These sequences direct 
tiw transcrqjtion of an mRNA which can be translated mto the polypeptide encoded by tiie DNA. Transcription 
termination sequences fiequentiy mclude DNA sequences of about 50 nucleotides capable of forming stem loop 
structures that aid in terminating transcription. Examples include transcription termmation sequences derived 
from genes with strong promoters, such as tiie trp gene in E. coli as well as other biosynthetic genes. 
Usually, the above described components, comprising a promoter, signal sequence (if desired), codmg sequence 
of mterest, and transcription termination sequence, are put togetiier into expression constmcts. Expression 
constructs are often maintained in a replicon, such as an extrachromosomal element {eg. plasmids) capable of 
stable maintenance m a host, such as bacteria. The replicon will have a replication system, tiius allowing it to be 
mamtamed in a prokaryotic host eitiier for e;q)ression or for cloning and amplification. In addition, a replicon 
m^ be either a high or low copy number plasmid. A high copy number plasmid will generally have a copy 
numbCT rangmg from tOwut 5 to about 200, and usually about 10 to about 150. A host contammg a high copy 
numba plaanid will preferably contain at least about 10, and more preferably at least about 20 plasmids. Eitiier 
a high or low copy numbo- vector may be selected, dependmg upon tiie effect of tiie vector and tiie foreign 
protein on the host. 

Ahematively, flie e:!q)ression c<mstracts can be mtegrated mto the bacterial genome witii an integrating vector. 
Integrating vectors usually contain at least one sequence homologous to tiie bacterial chromosome tiiat allows 
tiie vector to integrate, hitegrations appear to result from recombinations between homologous DNA m tiie 
vector and the bacterial chromosome. For example, integrating vectors constructed with DNA from various 
BaciUus strains integrate into tiie Bacillus chromosome (EP-A- 0 127 328). Integrating vectors may also be 
comprised of bacteriophage or transposon sequences. 

Usually, extrachromosomal and mtegrating expression constructs may contain selectable markers to allow for 
tiie selection of bacterial strams tiiat have been transformed. Selectable markers can be expressed m tiie bacterial 
host and may include genes y*ich render bacteria resistant to drugs such as ampicillm, chloramphenicol, 
eryflttomydn, kanamycm (neomycm), and tetiacycline [Davies et al (1978) Annu. Rev. McrobioL 32:469]. 
Selectable markers m^ also mclude biosyntiietic geass, such as tiiose m tiie histidine, tryptophan, and leucine 
biosynthetic pathways. 

Alternatively, some of tiie above described components can be put togetiier m transformation vectors. 
Transformation vectors are usually comprised of a selectable market tiwt is eitiier mamtamed m a replicon or 
developed into an integrating vector, as described above. 

E}q)ression and transformation vectors, eitiier extra-chromosomal replicons or mtegrating vectors^ have been 
developed for transformation mto many bacteria. For example, e}q)ression vectors have been developed for, inter 
alia, tiie following bacteria: Bacillus subtilis [Palva et al. (1982) Proc. NatL Acad Sci. USA 7P:5582; EP-A-0 
036 259 and EP-A-0 063 953; WO 84/04541], Escherichia coli [Shimatake et al. (1981) Nature 292.12%; Amann 
et al. (1985) Gene ¥0:183; Studier et al. (1986) J. Mol Biol 75P:1 13; EP-A-0 036 776,EP-A-0 136 829 and EP- 
A-0 136 907], Streptococcus cremoris [Powell et al. (1988) Appl. Environ. McrobioL 5¥:655]; Streptococcus 
lividans [Powell etal. {\9ZZ)AppL Environ. Microbiol 5^:655], Streptomyces lividans [US patent 4,745,056]. 
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Methods of introducing exogenous DNA into bacterial hosts are weU-known in tbe art, and usually include 
either flie transformation of bacteria treated wifli CaCfe or other agents, such as divalent cations and DMSO. 
DNA can also be introduced into bacterial cells by electroporation. Transfomiation procedures usually vary wifli 
the bacterial species to be transformed. See eg. [Masson et aL (1989) FEMS Mcrohiol Lett. 60:213', Palva et al 
(1982) Proc Natl. Acad ScL USA 79:5522; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541. BaciUus], 
[Miller et al. (1988) Proc. Natl. Acad. Set 55:856; Wang et al. (1990) J. Bacterial. 172:949, Campylobacter], 
[Cohen et al. (1973) Proc. Natl. Acad Sci. 69-21X0; Dower et al. (1988) Nucleic Acids Res. 16:6X21; Kushner 
(1978) "An improved method for transformation of Escherichia coli with ColEl-derived plasmids. hi Genetic 
Engineering: Proceedings of the International Symposium on Genetic Engineering (eds. H.W. Boyer and S. 
Nicosia); Mandel et al. (1970) J. Mol. Biol. 53:159; Taketo (1988) Biochim. Biophys. Acta 949:3XZ; 
Escherichia], [Chassy et al. (1987) FEMS Microbiol. Lett. 44:113 LactobaciUus]; [Fiedler et al. (1988) AnaL 
Biochem 170'3i, Pseudomonas]; [Augustin et al. (1990) FEMS Microbiol Lett. 66203, Staphylococcus], 
[Barany et al. (1980) J. Bacterial 144:69%; Hariander (1987) "Transformation of Streptococcus hictis by 
electroporation, in: Streptococcal Genetics (ed. J. Ferretti and R. Curtiss m); Perry et al. (1981) Infect, bnmun. 
52:1295; PoweU et al. (1988) Appl Environ. Microbiol 54:655; Somkuti et al (1987) Proc. 4th Evr. Cong. 
Biotechnology 1:4X2, Streptococcus]. 
V. Yeast Expression 

Yeast ejqpresaon ^sterns are also known to one of ordinary skill in ihe art A yeast promoter is any DNA 
sequence capable of binding yeast KNA polymerase and mitiating the downstream (3") transcription of a coding 
sequence (eg. structural gene) into mRNA A promoter will have a transcription initiation region which is 
usually placed proximal to the 5' end of the coding sequence. This transcription initiation region usuaUy includes 
an KNA polymerase binding site (the "TATA Box") and a transcription initiation site. A yeast promoter may 
also have a second domain called an upstream activator sequence (UAS), which, if present, is usually distal to 
the structural gene. The UAS permits regulated (inducible) expression. Constitutive expression occurs in the 
absence of a UAS. Regulated expression may be either positive or negative, thereby either enhancing or 
reducing transcription. 

Yeast is a fermenting organism wiA an active metebolic patiiway, Aerefore sequences eacoHog enzymes in the 
metabolic pathway provide particularly useful promote sequences. Examples mclude alcohol dehydrogemise 
(ADH) CEP-A-0 284 044), enolase, glucokmase, glucose-6-phosphate isomerase, glyceraldehyde-3-phosphate- 
dehydrogenase (GAP or GAPDH), hexokuiase, phosphofiuctokinase, 3-phosphoglycerate mutase, and pyruvate 
kinase (PyK) (EPO-A-O 329 203). The yeast PH05 gene, encoding acid phosphatase, also provides useful 
promote sequences [Myanohara et al. (1983) Proc. Natl Acad Sci. USA 80:X]. 

In addition, synthetic promoters which do not occur in nature also fimction as yeast promoters. For e;^ple, 
UAS sequences of one yeast promoter may be joined witii the transcription activation region of another yeast 
promoter, creating a synthetic hybrid promoter. Examples of such hybrid promoters include the ADH regulatory 
sequence linked to die GAP transcription activation region (US Patent Nos. 4,876,197 and 4,880,734). Oflier 
examples of hybrid promoters include promoters which consist of tiie regulatory sequences of either tiie ADH2, 
GAL4, GALIO, OR PH05 genes, combined witii tiie transcriptional activation region of a glycolytic enzyme 
gpne such as GAP or PyK (EP-A-0 164 556). Furthermore, a yeast promoter can include naturally occurring 
promoters of non-yeast origm tiiat have tiie ability to bmd yeast RNA polymerase and mitiate transcription. 
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Examples of such promoters include, inter alia, [Cohen et al. (1980) Proc. Natl Acad. ScL USA 77:1078; 
Henikofif «^ al (1981) Nature 283:S35; HoUenberg et al. (1981) Curr. Topics Microbiol. ImmmoL 9&.\\9\^ 
HoUenberg et al (1979) "The Expression of Bacterial Antibiotic Resistance Genes m the Yeast Saccharomyces' 
cerevisiae," in: Plasmids of Medical Environmental and Commercial Importance (eds. K.N. Timmis and A. 
5 Puhler); Mercerau-Puigalon et al (1980) Gene ii:163; Panthier et al (1980) Curr. Genet. 2:109;]. 

A DNA molecule may be expressed intracellularly in yeast A promoter sequence may be directly Imked with 
the DNA molecule, in which case the first amino acid at the N-terminus of the recombinant protein will always 
be a methionine, which is encoded by the ATG start codon. If desiied, methionine at the N-teiminus may be 
cleaved from the protein by in vitro incubation with cyanogen bromide. 

10 Fusion proteins provide an alternative for yeast expression systems, as well as in mammalian, baculovirus, and 
bacterial expression systems. Usually, a DNA sequence encoding the N-terminal portion of an endogenous yeast 
protem, or other stable protein, is fused to the 5* end of heterologous coding sequences. Upon expression, this 
construct will provide a fusion of the two amino acid sequences. For example, the yeast or human superoxide 
dismutase (SOD) gene, can be linked at the 5' temunus of a foreign gene and e^ressed in yeast. The DNA 

1 S sequence at the junction of Ihe two amino acid sequences may or may not encode a cleavable site. See eg. EP-A- 
0 196 0S6. Another example is a ubiquitin fusion protem. Such a fusion protein is made with Ae ubiquitin re^on 
that preferably retains a site for a processing enzyme {eg, ubiquitin-specific processing protease) to cleave the 
ubiquitin from the foreign protein. Through this method, therefore, nadve foreign protem can be isolated {eg. 
WO88/024066). 

20 Altematively, foreign proteins can also be secreted fsom the cell into tiie growth media by creating chimeric 
DNA molecules that encode a frision protein comprised of a leader sequence fragment that provide for secretion 
in yeast of the foreign protein. Preferably, there are processing sites encoded between the leader fragment and 
the foreign gene that can be cleaved either in vivo or in vitro. The leader sequence fragment usually encodes a 
signal peptide comprised of hydrophobic amino acids wiiich direct tiie secretion of the protein from the cell. 

25 DNA encoding suitable signal sequences can be derived from genes for secreted yeast proteins, such as the yeast 
invertase gene (EP-A-0 012 873; JPO. 62,096,086) and die A-factor gene (US patent 4,588,684). Altematively, 
leaders of non-yeast origin, such as an interferon leader, exist that also provide for secretion in yeast (EP-A-0 
060 057). 

A preferred class of secretion leaders are those that employ a fragment of the yeast alpha-factor gene, which 
30 contains botii a "pre" signal sequence, and a "pro" region. The types of alpha-factor fragments tibat can be 
employed mclude the frdl-length pre-pro alpha factor leader (about 83 amino acid residues) as well as truncated 
alpha-factor leaders (usually about 25 to about 50 ammo acid residues) (US Patents 4,546,083 and 4,870,008; 
EP-A-0 324 274). Additional leaders employing an alpha-factor leader firagment that provides for secretion 
include hybrid alpha-factor leaders made with a presequence of a first yeast, but a pro-region from a second 
35 yeast alphafactor. (eg see WO 89/02463.) 

Usually, transcription termination sequences recognized by yeast are regulatory regions located 3* to the 
translation stop codon, and thus together with the promoter flank the coding sequence. These sequences direct 
the transcription of an mRNA which can be translated into the polypeptide encoded by the DNA. Examples of 
transcription terminator sequence and odier yeast-recognized termination sequences, such as those coding for 
40 glycolytic enzymes. 
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Usually, the above described components, comprising a promoter, leader Of desired), coding sequence of 
interest, and transcription termination sequence, are put toge&er into expression constructs. Expression 
constructs are often maintained in arepUcon, sudi as an extrachromosomal element (eg. plasmids) capable of 
stable maintenance in a host, such as yeast or bacteria. The replicon may have two repUcation systems, thus 
allowing it to be maintained, for example, in yeast for expression and in a prokaryotic host for cloning and 
ampUfication. Examples of such yeast-bacteria shuttle vectors include YEp24 [Botstein et al (1979) Gene 8:11- 
24], pCVl [Brake et al (1984) Proc. Natl Acad. Sci USA 5i -.4642-4646], and YRpl7 [Stinchcomb et al (1982) 
J. Mol Biol 158:151]. In addition, a replicon may be either a high or low copy number plasmid. A high copy 
number plasmid will generally have a copy number ranging from about 5 to about 200, and usually about 10 to 
about 150. A host containmg a high copy number plasmid wUl preferably have at least about 10, and more 
preferably at least about 20. Enter a high or low copy number vector may be selected, dependmg i^on the effect 
of flie vector and 4e foreign protein on the host See eg. Brake et «/., st^a. 

Alternatively, the expression constructs can be mtegrated mto the yeast genome wiA an integrating vector, 
hitegrating vectors usually contain at least one sequence homologous to a yeast chromosome Aat allows the 
; vector to integrate, and preferably contain two homologous sequences flankmg the e3q)ression construct 
totegrations appear to result from recombinations between homologous DNA in the vector and the yeast 
chromosome [Orr-Weaver et a/. (1983) Methods in Enzymol i0/:228-245]. An integrating vector may be 
directed to a specific locus m yeast by selecting the appropriate homologous sequence for inclusion in the vector. 
See Orr-Weaver et al, supra. One or more expression construct may mtegrate, possibly affectmg levels of 
20 recombinant protem produced [Rine et al (1983) Proc. Natl Acad Sci. USA 80:6150]. The chromosomal 
sequences included m the vector can occur either as a smgle segment m the vector, which results in the integra- 
tion of the entire vector, or two segments homologous to adjacent segments m the chromosome and flanking the 
expression construct m the vector, which can result m the stable integration of only the expression construct 
Usually, extrachromosomal and mtegraling expressfon constructs may contain selectable markers to allow for 
25 the selection of yeast strains that have been transformed. Selectable maricers may niclude biosynthetic genes that 
can be expressed in the yeast host, such as ADE2, HIS4, LEU2, TRPl, and AW7, and flie G41 8 resistance gene, 
which confer resistance m yeast cells to tunicamycin and G418. respectively. In addition, a suitable selectable 
marker may also provide yeast with the ability to grow m the presence of toxic compounds, such as metal. For 
example, the presence of CUPl allows yeast to grow m the presence of copper ions [Butt et al (1987) 
30 Mcrobiol, Rev. 5i:351]. 

Alternatively, some of the above described components can be put together into transformation vectors. 
Transformation vectors are usually comprised of a selectable marker that is either maintained in a repUcon or 
developed into an integrating vector, as described above. 

Expression and transformation vectors, either extrachromosomal replicons or integrating vectors, have been 
35 developed for transformation into many yeasts. For example, expression vectors have been developed for, inter 
alia, the following yeasts:Candida albicans [Kurtz, et al (1986) Mol Cell Biol 5:142], Candida maltosa 
[Kunze, et al (1985) J. Basic Microbiol. 25:141]. Hansenula polymorpha [Gleeson, et al (1986) J. Gen. 
Mcrobiol 132-3459; Roggenkamp et al (1986) Mol Gen. Genet. 202-302], Kluyveromyces fiagUis [Das, et al 
(1984) J. Bacterial J5S:1165], Kluyveromyces lactis [De Louvencourt et al. (1983) J. Bacterial 154:131; Van 
40 den Berg et al. (1990) BioTrechnology S:135], Pichia guillerimondii [Kunze et al (1985) J. Basic Mcrobiol 
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25:141], Pichia pastoris [Cregg, et al (1985) Mol Cell Biol 5:3376; US Patent Nos. 4.837,148 and 4,929,555], 
Saccharomyces cerevisiae [Hinnen et al (1978) Proc. Natl Acad. Set USA 75:1929; Ito et al (1983) 
Bacteriol 753:163], Schizosaccharomyces pombe [Beach and Nurse (1981) Nature 500:706], and Yarrowi 
iipolytica [Davidow, etal (1985) Cwrr. Genet. 70:380471 Gaillardin, etal (1985) Curr. Genet 70:49], 
5 Methods of introducing exogenous DNA into yeast hosts are well-known in the art, and usually include either 
the transformation of spheroplasts or of mtact yeast cells treated with alkali cations. Transformation procedures 
usually vary with the yeast species to be transformed. See eg. [Kurtz et al (1986) Mol Cell Biol 5:142; Kunze 
et al (1985) J. Basic Microbiol 25:141; Candida]; [Gleeson et al (1986) J. Gen. Microbiol 752:3459; 
Roggenkamp et al (1986) Mol Gen. Genet 202:302; Hansenula]; [Das et al (1984) J. Bacteriol 755:1 165; De 

10 Louvencourt et al (1983) J. Bacteriol 75-^:1165; Van den Berg et al (1990) Bio/Technology 5:135; 
Kluyveromyces]; [Cregg et al. (1985) Mol Cell Biol 5:3376; Kunze et al (1985) J. Basic Microbiol 25:141; 
US Patent Nos. 4,837,148 and 4,929,555; Pichia]; [Humen et al (1978) Proc. Natl Acad ScL USA 75;1929; Ito 
et al (1983) J. Bacteriol 753:163 Saccharomyces]; [Beach and Nurse (1981) Nature 300:706; 
Schizosaccharomyces]; [Davidow et al (1985) Curr. Genet 70:39; Gaillardm et al (1985) Curr. Genet 70:49; 

15 Yarrowia]. 

Antibodies 

As used herem, the term "antibody** refers to a polypeptide or ffoup of polypeptides composed of at least one 
antibody combining site. An "antibody combining site" is the three-dimensional binding space with an internal 
surface shape and charge distribution complementary to the features of an epitope of an antigen, which allows a 
20 bmding of the antibody with the antigen. "Antibody** includes, for example, vertebrate antibodies, hybrid 
antibodies, chuneric antibodies, humanised antibodies, altered antibodies, univalent antibodies. Fab proteins, and 
single domain antibodies. 

Antibodies against the proteins of the mvention are useful for affinity chromatography, immunoassays, and 
distinguishing/identifying streptococcus proteins. 

25 Antibodies to the proteins of the mvention, both polyclonal and monoclonal, may be prepared by conventional 
methods, hi general, the protein is first used to unmunize a suitable animal, preferably a mouse, rat, rabbit or 
goat. Rabbits and goats are preferred for the preparation of polyclonal sera due to the volume of serum 
obtainable, and tiie availability of labeled anti-rabbit and anti-goat antibodies, bmnunization is generally 
performed by mixmg or emulsifying the protein in saline, preferably m an adjuvant such as Freund's complete 

30 adjuvant, and injecting the mixture or emulsion parenterally (generally subcutaneously or intramuscularly). A 
dose of 50-200 .g/injection is typically suflBcient. Immunization is generally boosted 2-6 weeks later with one or 
more injections of the protein in salme, preferably usmg Freuntfs incomplete adjuvant. One may alternatively 
generate antibodies by in vitro hnmunization using methods known in the art, which for the purposes of this 
invention is considered equivalent to in vivo immunization. Polyclonal antisera is obtained by bleeding the 

35 unmunized animal into a glass or plastic container, incubating the blood at ISC for one hour, followed by 
mcubatmg at 4C for 2-18 hours. The serum is recovered by centrifiigation {eg. IfiOOg for 10 minutes). About 
20-50 ml per bleed may be obtained fiiom rabbits. 

Monoclonal antibodies are prepared usmg the standard method of Kohler & Milstein [Nature (1975) 
256:495-96], or a modification tiiereof. Typically, a mouse or rat is inmiunized as described above. However, 
40 rather than bleeding tiie animal to extract serum, the spleen (and optionally several large lymph nodes) is 
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removed and dissociated into single cells. If desired, the spleen cells be screened (after removal of 
nonspecificaUy adherent cells) by ^plymg a cell suspension to a plate or weU coated wifli the protein antigen. 
Bm»11s expressing membrane-bound inmiunoglobulin specific for tiw antigen bind to the plate, and are not 
rinsed away with the rest of the suspension. Resulting B-cells, or all dissociated spleen cells, are then induced to 
fuse with myeloma cells to form hybridomas, and are cultured in a selective medium {eg. hypoxanthine, 
aminopterin, thymidine medium, "HAT')- The resulting hybridomas are plated by limiting dilution, and are 
assayed for production of antibodies which bind specificaUy to the immunizing antigen (and which do not bind 
to unrelated antigens). The selected MAb-secreting hybridomas are then cultured either in vitro (eg. in tissue 
culture bottles or hollow fiber reactors), or in vivo (as ascites in mice). 
10 If desired, the antibodies (whetiier polyclonal or monoclonal) may be labeled using conventional techniques. 
Suitable labels include fluorophores, chromophores, radioactive atoms (particularly '^P and '"l). electron-dense 
reagents, enzymes, and Ugands having specific binding partners. Enzymes are typically detected by tiieir 
activity. For example, horseradish peroxidase is usually detected by its ability to ^convert 
3,3',5,5'-tetramethylbenzidine (TMB) to a blue pigment, quantifiabte whh a spectrophotometer. "Specific 
15 binding partner" refers to a protein enable of binding a Ugand molecule vwfli high specificity, as for example in 
the case of an antigen and a monoclomd antibody specific tiierefor. Other specific bmdmg partners mclude biotin 
and avidm or streptavidin, IgQ and protem A, and the numerous receptor-ligand couples known in tiie art. It 
should be understood that the above description is not meant to categorize tiie various labels into distinct classes, 
as fte same label may serve in several different modes. For example, '^I may serve as a radioactive label or as 
20 an electron-dense reagent. HRP may serve as enzyme or as antigen for a MAb. Further, one may combine 
various labels for desired effect. For example, MAbs and avidin also require labels m the pracfee of this 
invention: thus, one might label a MAb witii biotin, and detect its presence with avidm labeled vwth "*I, or witii 
an anti-biotin MAb labeled with HRP. Other permutations and possibilities wiU be readily apparent to fliose of 
ordinary skffl m the art, and are considered as equivalents within the scope of the instant invention. 

25 Pharmaceutical Compositions 

Pharmaceutical compositions can comprise eitiier polypeptides, antibodies, or nucleic acid of the invention. The 
pharmaceutical compositions will comprise a tiierapeutically effective amount of eitiier polypeptides, antibodies, 
or polynucleotides of tiie claimed invention. 

The term "tiierapeuticaUy effective amounf as used herein refers to an amount of a tiierapeutic agent to treat, 
30 ameliorate, or prevent a desired disease or condition, or to exhibit a detectable therapeutic or preventative effect 
The effect can be detected by, for example, chemical markers or antigen levels. Therapeutic effects also include 
reduction in physical symptoms, such as decreased body temperature. The precise effective amount for a subject 
wiU depend upon tiie subject's size and healtii, tiie nature and extent of tiie condition, and tiie tiierapeutics or 
combination of tiierapeutics selected for administiation. Thus, it is not useful to specify an exact effective 
35 amount in advance. However, tiie effective amount for a given sihiation can be determmed by routine 
experimentation and is witinn the judgement of the clinician. 

For purposes of tiie present invention, an effective dose will be fi-om about 0.01 mg/ kg to 50 mg/kg or 0.05 
mg/kg to about 10 mg/kg of tiie DNA constructs in tiie individual to which it is administered. 
A pharmaceutical composition can also contain a pharmaceutically acceptable carrier. The term 
40 "pharmaceutically acceptable carrier" refers to a carrier for administiation of a tiierapeutic agent, such as 
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antibodies or a polypeptide, genes, and oAer therapeutic agents. The term refers to any pharmaceutical carrier 
tiiat does not itself induce the production of antibodies harmful to tiie individual receiving the composition, and^ 
which may be administered without undue toxicity. Suitable carriere may be large, slowly metabolizeci^F 
macromolecules such as proteins, polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, 
5 amino acid copolymers, and inactive virus particles. Such carriers are well known to those of ordinary skill in 
the art 

Pharmaceutically acceptable salts can be used tiierein, for example, mineral acid salts such as hydrochlorides, 
hydrobromides, phosphates, sulfates, and the like; and flie salts of organic acids such as acetates, propionates, 
malonates, benzoates, and the like. A thorough discussion of pharmaceutically acceptable excipients is available 
10 in Remmgton's Pharmaceutical Sciences (Mack Pub. Co., NJ. 1991). 

Pharmaceutically acceptable carriers in therapeutic compositions may contain liquids such as water, saline, 
glycerol and edianol. Additionally, auidliary substances, such as wetting or emulsifying agents, pH buffering 
substances, and tiie like, may be present m such vehicles. Typically, tiie therapeutic compositions are prepared as 
mjectables, eitiier as liquid solutions or suspensions; solid forais suitable for solution in, or suspension in, liquid 
15 vehicles prior to injection may also be prepared. Liposomes are included witiiin tiie definition of a 
pharmaceutically acceptable carrier. 

Deliverv Methods 

Once formulated, tiie compositions of the invention can be administered duectiy to the subject The subjects to 
be treated can be animals; in particular, human subjects can be treated. 

20 Direct delivery of the compositions will generally be accomplished by mjection, either subcutaneously, 
intrq)eritoneally, intravenously or intramuscularly or delivered to tiie interstitial space of a tissue. The 
compositions can also be administered into a lesion. Other modes of administration include oral and pulmonary 
administration, suppositories, nasal, and transdermal or transcutaneous applications (eg. see WO98^0734), 
needles, and gene guns or hyposprays. 

25 The nature of any carriers or other mgredients included in compositions will depend on the specific route of 
administration and particular embodunent of the invention to be administered. Antibiotics, for example, exist in 
various formulations. 

Dosage of low molecular weight compounds will depend on the disease state or condition to be treated and otfier 
clinical factors such as weight and condition of the human or animal and the route of administration of the 
30 compound. For treating human or animals, between approximately 0.5 mg/kg of body weight to 500 mg/kg of 
body weight of tiie compound can be administered. Therapy is typically administered at lower dosages and is 
continued until the desired therapeutic outcome is observed. 
Dosage treatment may be a single dose schedule or a multiple dose schedule. 
Polynucleotide and polypeptide pharmaceutical compositions 

35 In addition to tiie pharmaceutically acceptable carriers and salts described above, the following additional agents 
can be used with polynucleotide and/or polypeptide compositions. 
AJolvpeptides 

One example are polypeptides which include, witiiout limitation: asioloorosomucoid (ASOR); transferrin; 
asialoglycoproteins; antibodies; antibody fragments; ferritin; interleukins; interferons, granulocyte, macrophage 
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^ colony stimulating fector (GM-CSF), granulocyte colony stimulating fiictor (G-CSF), macrophage colony 
stimulating factor (M-CSF), stem ceU factor and erythropoietin. Viral antigens, such as envelope proteins, can 
also be used. Also, proteins ftom other invasive organisms, such as the 17 ammo acid peptide from the 
circumsporozoite protein of plaanodium Mciparum known as BII. 
> B.Hormones Vitamins, etc. 

Other groups that can be mcluded are, for example: hormones, steroids, androgens, estrogens, thyroid hormone, 
or vitamins, folic acid. 
CPolvalkvlenes. Polysaccharides, etc. 

Also, polyalkylene glycol can be included with the deshed polynucleotides/polypeptides. In a preferred 
0 embodiment, the polyalkylene glycol is polyethlylene glycol. In addition, mono-, di-, or polysaccharides can be 
included. In a preferred embodiment of Ais aspect, the polysaccharide is dextran or DEAE-dextran. Also, 
chitosan and poly(lactide-co-glycolide) 
D JJpids. and Liposomes 

The desired polynucleotide/polypeptide can also be encapsulated m lipids or padmged m Uposonaes prior to 
1 S delivery to Has subject or to cells derived tiierefiom. 

Upid encapsuhition is generally accomplished usmg Uposomes which are able to stably bmd or entrap and retain 
nucleic acid. The ratio of condensed polynucleotide to lipid preparation can vary but will generally be around 
1:1 (mg DNAmiicromoles lipid), or more of lipid. For a review of the use of liposomes as carriers for delivery of 
nucleic adds, see. Hug and Sleight (1991) Biochinu Biophys. Acta. 1097:1-17; Straubmger (1983) Metfu 
20 EnzymoL 101:512-527. 

Liposomal preparations for use in the present invention include cationic (positively charged), anionic (negatively 
charged) and neutral preparations. Cationic liposomes have been shown to mediate intracellular delivery of 
plasmid DNA (Feigner (1987) Proa. Natl. Acad Sci. USA 84:7413-7416); mRNA (Malone (1989) Proa. Natl. 
Acad Sci. USA 86:6077-6081); and purified transcription factors (Debs (1990) /. BioL Chem. 
25 265:10189-10192), in functional form. 

Cationic Uposomes are readUy available. For example, N[l-2,3-dioleyloxy)propyl]-N.N,N-trietiiylammonium 
(DOTMA) Uposomes are available under tiie trademark Lipofectin, fix)m GIBCO BRL, Grand Island, NY. (See, 
also. Feigner siqa-a). Otiier commercially available liposomes mchide transfectace (DDAB/DOPE) and 
DOTAP/DOPE (Boerhmger). Oflier cationic liposomes can be prepared from readily available materials using 
30 techniques well known m the art. See, eg. Szoka (1978) Proc. Natl Acad. Set USA 75:4194-4198; WO90/1 1092 
for a description of tiie syntiiesis of DOTAP (l,2-bis(oleoyloxy)-3-(toimethylammonio)propane) Uposomes. 
Shnilariy, anionic and neutral Uposomes are readUy available, such as ftom Avanti Polar Lipids (Birmmgham, 
AL), or can be easily prepared usmg readily available materials. Such materials include phosphatidyl choline, 
cholesterol, phosphatidyl etiianolamme, dioleoylphosphatidyl choline (DOPC), dioleoylphosphatidyl glycerol 
35 (DOPG), dioleoylphoshatidyl etiianolanune (DOPE), among others. These materials can also be mixed witii tiie 
DOTMA and DOTAP starting materials in appropriate ratios. Methods for makmg Uposomes using tiiese 
materials are weU known in the art 

The Uposomes can comprise multilammelar vesicles (MLVs), small unilameUar vesicles (SUVs). or large 
unilameUar vesicles (LUVs). Ihe various liposome-nucleic acid complexes are prepared using methods known 
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in the art. See eg. Straubinger (1983) Metk Immunol 101:512-527; Szoka (1978) Proc. NatL Acad Set USA 
75:4194^198; Papahadjopoulos (1975) Biochim. Biophys. Acta 394:483; Wilson (1979) Cell 17:77); Deamer &^ 
Bangham (1976) Biochim. Biophys. Acta 443:629; Ostro (1977) Biochem. Biophys. Res. Commun. 76:836;W 
Fraley (1979) Proc. Natl Acad Scl USA 76:3348); Enoch & Strittmatter (1979) Proc. Natl Acad Sci. USA 
5 76:145; Fraley (1980) J. Biol Cham. (1980) 255:10431; Szoka & Papahadjopoulos (1978) Proc. Natl Acad Sci. 
USA 75:145; and Schaefer-Ridder (1982) Science 215:166. 

EXipoproteins 

In addition, lipoproteins can be included with the polynucleotide/polypeptide to be delivered. Examples of 
lipoproteins to be utilized include: chylomicrons, HDL, IDL, LDL, and VLDL. Mutants, fragments, or fusions 
10 of these proteins can also be used. Also, modifications of naturally occurring lipoproteins can be used, such as 
acetylated LDL. These lipoproteins can target the delivery of polynucleotides to cells expressing lipoprotein 
receptors. Preferably, if lipoproteins are including with the polynucleotide to be delivered, no other targeting 
ligand is included in the composition. 

Naturally occurring lipoprotems comprise a lipid and a protem portion. The protein portion are known as 
15 apoprotems. At the present, apoproteins A, B, C, D, and E have been isolated and identified. At least two of 
these contain several proteins, designated by Roman numerals, AI, All, AIV; CI, Cn, CIIL 

A lipoprotein can comprise more tfian one apoprotein. For example, naturally occurring chylomicrons comprises 
of A, B, C & E, over time these lipoproteins lose A and acqmre C & B. VLDL comprises A, B, C & E 
apoproteins, LDL comprises apoprotein B; and HDL comprises apoproteins A, C, & E. 

20 The ammo acid of these apoproteins are knovm and are described m, for example, Breslow (1985) Annu Rev. 
Biochem 54:699; Law (1986) Adv. Exp Med. Biol. 151:162; Chen (1986) J Biol Chem 261:12918; Kane (1980) 
Proc Nad Acad Sci USA 77:2465; and Utermann (1984) Hum Genet 65232. 

Lipoprotems contain a variety of lipids including, triglycerides, cholesterol (firee and esters), and phospholipids. 
The composition of the lipids varies in naturally occurring lipoproteins. For example, chylomicrons comprise 
25 mainly triglycerides. A more detailed description of the lipid content of naturally occurring lipoproteins can be 
found, for example, in Metk Enzymol 128 (1986). The composition of the lipids are chosen to aid in 
conformation of the apoprotein for receptor binding activity. The composition of lipids can also be chosen to 
facilitate hydrophobic interaction and association with the polynucleotide binding molecule. 

Naturally occurring lipoproteins can be isolated from serum by ultracentrifugation, for instance. Such methods 
30 are described m Metk Enzymol {supra)\ Pitas (1980) J. Biochem. 255:5454-5460 and Mahey (1979) J Clin. 
Invest 64:743-750. Lipoproteins can also be produced by in vitro or recombinant methods by expression of Ihe 
apoprotein genes in a desured host cell. See, for example, Atkinson (1986) Annu Rev Biophys Chem 15:403 and 
Raddmg (1958) Biochim Biophys Acta 30: 443. Lipoproteins can also be purchased from commercial suppliers, 
such as Biomedical Techniologies, Inc., Stoughton, MA, USA. Further description of lipoproteins can be found 
35 inWO98/06437.. 

F.Polvcationic Agents 

Polycadonic agents can be included, with or without' lipoprotein, in a composition wifli the desired 
polynucleotide/polypeptide to be delivered. 
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Polycationic agents, typically, exhibit a net positive charge at physiological relevant pH and are capable of 
neutralizing the electrical charge of nucleic acids to fecilitate deUvery to a desired location. These agents have 

P both in vitro, ex vivo, and in vivo applications. Polycationic agents can be used to deUver nucleic acids to a 
li>nng subject eith» intramusculaiiy, subcutaneously, etc. 
5 The foUowing are examples of useM polypeptides as polycationic agents: polylysine, polyarginine, 
polyomilhine, and protamme. Other examples include histones, protamines, human serum albumin, DNA 
binding protdns, non-histone chromosomal proteins, coat proteins from DNA viruses, such as (X174, 
transcriptional fectors also contain domains that bmd DNA and therefore may be useful as nucleic aid 
condensing agents. Briefly, transcriptional fectors such as C/CEBP, c-jun, c-fos, AP-1. AP-2, AP-3, CPF, Prot-1, 

10 Sp-1, Oct-1, Oct-2, CREP, and TFHD contain basic dom^ns that bind DNA sequences. 
Organic polycationic agents include: ^ermine, spermidine, and purtrescme. 

The dimensions and of die physical properties of a polycationic agent can be extrapolated from tiie Ust above, to 
construct other polypeptide polycationic agents or to produce synthetic polycationic agents. 
Synflietic polycationic agents which are useful include, for example, DEAE-dextran, polybrene. Lipofectm., and 
15 UpofectAMINE. are monomers tiiat foim polycationic complexes when combined witii 
polynucleotides/polypeptides. 

MODES FOR CARRYING OIJT THE INVENTION 

Isogenic deletion mutants of dmical isolate strain D39 of S.pneumoniae (serotype 2) were prepared 
using Overlap Extension [Amberg et.al. (1995) Yeast 11:1275-1280] for several S.pmtmoniae genes 
20 to assess the effect of deletion on viability. Precise gene disnq)tions were achieved by gene splicing 
following a "double fiision" PGR strategy. Each process was accomplished with a total of five PGR 
reactions: three standard PGR ampHfications and two fusion PGR reactions. The first step was 
performed by amplifying an upstream (fragment U, primers: Fl + R2) and a downstream region 
(fragment D, primers: F5 + R6) for each gene to disrupt, plus a selectable marker sequence (fragment 
25 K, primers: F3 + R4) to replace the gene's reading fi-ame in between. The aphA-3 gene (kanamycin 
resistance) was chosen as universal K fragment for all mutant constructs. It was amplified in order to 
contain 24 bp 5' and 3' tails showing complementary sequence to U-3' and D-5' ends, respectively. 
A first fiision PGR was performed to link D to K. Each KD amplified fragment was then gel purified 
and a second fiision PGR reaction was performed in order to fiise it to the corresponding U firagment. 
30 Fmal chimera products constitute for gene disruption cassettes (UKD). During tiie final fiision PGR 
in the presence of primers Fl and R6, tiiey were amplified by AmpliTaq polymerase (Applera) able 
to add a single deoxyadenosine to tiie 3' ends of botii DNA strands. Each construct was ligated into a 
pGEM-T Easy vector (Promega) endowed of smgle 3'-T overhangs at the insertion site and tiien 
introduced by electroporation into Kcoli DHIOB bacteria (Invitrogen). Plasmid minipreps were 
35 retrieved from true recombinant colonies and the rightness of chimeric inserts, was confirmed by 
PGR Plamid DNAs were used to transform Sp usmg synthetic CSP-1 to induce natural competence 
[Havarstein et al. (1995) 92:11140-44]. Briefly, early log phase D39 cultures (ODgoo = 0.05-0.1) 
were diluted 1:10 witii brain heart infiision brolh (BHEB) supplemented with 100 ng/ml CSP-1, 10 
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mM glucose and 10% inactivated horse serum (Sigma) and incubated for 15 min at ST'C and 5% 
CO2 without aeration. Plasmid DNA (l^g) was added and samples were incubated for 1 h before 
being spread on selective blood agar plates (tryptic soy agar, TSA-Difco, supplemented with 3^ 
defibrmated sheep blood and 500 ng/ml of kanamycin). Growth was allowed for 1-2 days at 37°C in 
5 an atmosphere of 5% C02- Five to ten KanR CPUs were screened for each sample either by PGR 
(primer F1+ R6) or by direct sequencing of chromosomal DNA to choose the correct isogenic mutant 
colony. 

Knockout of any of the 91 genes listed in Table 1 resulted ih.no growth, indicating that the genes are 
essential for pneumococcal viability. Knockout of any of the 10 genes listed in Table 2 gave bacteria 
10 which had poor growth characteristics when cultured in the absence of blood. In contiast, knockout 
of any of the genes listed in Table 3 had no effect on growth phenotype. 

It will be understood that the invention has been described by way of example only and modifications 
may be made whilst remaining within the scope and spirit of the invention. 



t 
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Table 1 — 91 genes for which knockout is letiial in TIGR4 strain 



SP0336 
SP0337 



SP0381 



SP0382 
SP0383 
SP0397 



SP0402 



SP0418 



SP0420 



|TIGR4 gene 


TIGR4 annotation 


|R6 gene | 


SP0005 


peptidyl-tRNA hydrolase (pth) 


Imv^^OO^C I 

jspruuuD 1 


SP0032 


DNA polymerase 1 (polA) 


ispruuoz 1 


SP0047 


phosphoribosylformyiglycinamlde cyclo-ligase (purM) 


lspruu4o 


SP0056 


adenylosuccinate lyase (purB) 


|spruuDD 1 


SP0092 


ABC transporter, substrate-binding protein 


jsproooo 


SP0102 


glycosyl transferase 


|spr0091 1 


SP0103 


capsular polysacchande biosynthesis protein, putative 


l^nrnnQ9 1 


SP0253 


glycerol dehydrogenase (gIdA) 


spr0234 1 


SP0261 


undecaprenyl diphosphate synthase (uppS) 


spr240 I 


SP0289 


dihydropteroate synthase 


spr0266 1 


SP0290 


dihydrofolate synthetase (folC) 


spr267 j 


|SP0292 


brfunctional folate synthesis protein (sulD) 


spr269 



Ipenicillin-binding protein 2X (pbpX) 



|spr304 



|phospho-N-acetylmuramoyl'pentapeptide-transferase (mraY) 
Imevalonate kinase (mvaK1) 



|spr305 



|spr338 



Idiphosphomevalonate decarboxylase (mvaP) 



|spr339 



Iphosphomevalonate kinase (mvaK2) 



ispr340 



mannitol-1 -phosphate 5-dehydrogenase (mtlD) 
Isignai peptidase I (spi) 



ispr359 



ispr364 



lacyl carrier protein (acpP) 

jmalonyl CoA-acyl carrier protein transacylase (fabP) 



|spr378 



ISP0423 acetyl-CoA carboxylase, brtoin carboxyl carrier protein (accB) 

lsP0425 acetyl-CoA carboxylase, biotin carboxylase (accC) 

|sP0477 Is-phospho-beta-galactosidase (lacG-1 ) 



|spr380 



spr0383 

lspr0385 



|sp424 



ISP0516 Iheat shock protein GrpE (grpE) 



|spr454 



I SP0529 iBIpC ABC transporter (bIpB) 

SP0605 Ifructose-bisphosphate aldolase (fba) 



|spr0466/0467 



|spr530 



spr0573 



SP0656 


hypothetical protein j 


spr0573 


SP0669 


thymidylate synthase (thyA) 


spr585 1 


SP0680 


Iribosomal small subunit pseudouridine synthase A (rsuA-2) j 


spr597 


spoesg 


UDP-N-acetylglucosamine-N-acetylmuramyKpentapeptide)pyro^^ 
jundecaprenol N-acetylglucosamine transferase (murG) 


spr0604 


SP0708 


amino acid ABC transporter, amino acid-binding protein, 
authentic frameshifl 


spr0621 1 


SP0756 


Icell division ABC transporter. ATP-binding protein FtsE (ftsE) 


Ispr0666 1 


SP0757 


Icell division ABC transporter, permease protein FtsX (ftsX) 


spr0667 


SP0762 


Is-adenosylmethionine synthetase (metK) 


|spr671 I 


SP0806 


|dNA gyrase subunit B (gyrB) 


|spr715 


|sp0839 


[pantothenate kinase (coaA) 


Ispr741 



|sP0865 pNA polymerase III, gamma and tau subunits (dnaX) 
lsP0876 1 1 -phosphofructokinase, putative 



|spr779 



lsP0935 Ithymidylate kinase (tmk) 



|spr835 



iSP0944 luridylate kinase (pyrH) 



|spr845 
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SP0945 


ribosome recycling factor (frr) 


spr846 


SP0974 


preprotein translocase. SecG subun'rt, putative 


spr877 


SP0988 


UDP-N-acetylglucosamine pyrophosphorylase (gimU) 


spr891 


SP1067 


cell division protein FtsW, putative 


spr0973 


SP1079 


GTP-binding protein. GTP1/Obg family 


spr984 


SP1084 


methionine aminopeptidase, type 1 (map) 


spr992 


SP1117 


DNA ligase, NAD-dependent (ligA) 


spr1024 


SP1128 


enolase (eno) 


spr1036 


SP1263 


DNA topoisomerase 1 (topA) 


spr1141 


SP1267 


icC protein (iicC) 


spr1145 


SP1268 


licB protein (licB) 


spr1146 


SP1269 


choline kinase (pck) 


spr1147 


sp1271 


cytidine diphosphocholine pyrophosphorylase, putative 


spr1149 


SP1272 


polysaccharide biosynthesis protein, putative 


spr1150 


sp1273 


licDI protein (licDI) 


spr1151 


SP1329 


N-acetyineuraminate lyase 


spr1186 


SP1360 


homoserine kinase (thrB) 


spr1218 


SP1366 


glycosyl transferase, group 1 


spr1224 


sp1367 


licD3 protein (licD3) 


spr1225 


SP1390 


UDP-N-acetylenolpyruvoylglucosamine reductase (murB) 


spr1247 


SP1420 


NH(3)-dependent NAD(+) synthetase (nadE) 


spr1276 


SP1456 


polypeptide deformylase (def-1) 


spr1310 


SP1458 


thioredoxin reductase (trxB) 


spr1312 


SP1492 


cell wall surface anchor family protein 


spr1 345 


SP1521 


UDP-N-acetylmuramate— alanine ligase (murC) 


spr1373 


SP1529 


polysaccharide biosynthesis protein, putative 


spr1 383 


SP1530 


UDP4sl-aoetylmuramoylalanyl-D-gIutamate-2,6-diaminopimelate ligase (murE) 


spr1 384 


SP1534 


inorganic pyrophosphatase, manganese-dependent (ppaC) 


spr1389 


SP1559 


phosphoglucomutase/phosphomannomutase family protein 


spr1417 


SP1571 


dihydrofolate reductase (folA) 


spr1429 


SP1589 

v^i 1 www 


Mur iioase famiiv Dratein 




SP1610 


Bcl-2 familv orotAin 




SP1655 

1 ww«^ 


Dhosohoalvcerate miitasp ^anmA^ 


onr1 AQQ 


SP1667 


cell division orotein FtsA ^fl'^A^ 




SP1670 


UDP-N-acetvlmuramovlaianvl-Q-alutamvl-^ 6->HiarninonlmelatA-- 
D-alanyl-D-alanyl ligase (murF) 




SP1690 


ABC transporter, substrate-binding protein 


spr1534 


SP1698 


alanine racemase (air) 


spr1540 


SP1699 


holo-(acyl-carrier protein) synthase (acpS) 


spr1541 


SP1709 


phosphoglycerate dehydrogenase-related protein 


spr1553 


sp1726 


3-hydroxy-3-methylglutaryl-CoA reductase 


spr1570 


SP1735 


methionyl-tRNA formyltransferase (fmt) 


spr1580 


SP1814 


indole-3-glycerol phosphate synthase (trpC) 


spr1634 


SP1881 


glutamate racemase (muri, gir) 


spr1696 


SP1906 


chaperonin, 60 kDa (groEL) 


spr1722 


SP1907 


chaperonin, 10 kDa (groES) 


spr1723 
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sp1968 1 


shosphopantetheine adenylyltransferase (coaD) J 


5pr1783 


SP1975 ! 


3polllJ family protein ^ 


5pr1790 


SP2012 J 


slyceraldehyde 3-phospliate deliydrogenase (gap) < 


3pr1825 


SP2216 ! 


<$ecmted 45 kd orotein {USD45) ' 


spr2021 


Fable 2 — 10 genes for which knockout results in poor growth characteristics in 1 


nGR4 strain 


TIGR4 gene 


1 ior\4 annorauun 


R6 gene 


SP0417 


3"OXoacyl-(acyl-carrier-proteln) synthase III (fabH) 


spr377 


SP0419 


enoyl-(acyl-carrier-protem) reductase (fabK) 


spr0379 


SP0424 


(3R)-hydroxymyristoyKacyl-camer-protein) dehydratase (fabZ) 


spr384 


SP0969 


GTP-b'mdIng protein Era (era) 


spr0871 


SP1161 


acetoin dehydrogenase complex, E3 component, 
dihydrolipoamide dehydrogenase, putative 


spr1048 


SP1649 


manganese ABC transporter, pemfiease protein, putative, 
authentic frameshift (psaC) 


spr1493 . 


SP1650 


manganese ABC transporter, manganese-binding adhesion 
liprotein (psaA) 


spr1494 


SP2047 


conserved domain protein 


spr1858 


SP2051 


competence protein CgIC (cglC) 


spr1862 


|SP2146 


conserved hypothetical protein 


spr1954 



NB: where the annotation specifies an «...ase", the polypeptide generally has enzymatic activity. 



Table 3 — Genes for which knockout does not affect in vitro grofwth character 


isti 


csofTIGR4 


TIGR4 gene 




TIGR4 gene 




riGR4 gene 




riGR4 gene 




riGR4 gene 




riGR4 gene 


SP0004 




SP0377 . 




SP0764 




SP1167 




SP1561 




SP1964 


SP0010 




SP0378 




SP0766 




SP1168 




SP1555 




SP1967 


SP0013 




SP0386 




SP0771 




SP1 174/1003 




SP1557 




sp1970 


SP0014/2006 




SP0390 




SP0785 




SP1175 




SP1560 




SP1978 


SP0034 


SP0391 


SP0797 


SP1176 




SP1573 


SP1981 


SP0037 


SP0400 


SP0804 


SP1190 


SP1576 


SP1990 


SP0041 


SP0403 


SP0820 


SP1191 


SP1580 


SP1992 


SP0042 


SP0406 


SP0825 


SP1192 


SP1586 


SP1995 


SP0043 


SP0410 


SP0829 


SP1193 


SP1591 


SP2006/0014 


SP0044 


SP0413 


SP0834 


SP1200 


SP1603 


SP2010 


SP0046 


SP0421 


SP0845 


SP1202 


SP1608 


SP2017 


SP0046 


SP0422 


SP0858 


SP1204 


SP1623 


SP2029 


SP0048 


SP0435 


SP0859 


SP1208 


SP1634 


sp2033 


SP0053 


SP0439 


SP0860 


SP1218 


SP1645 




SP2041 


SP0054 


SP0447 




SP0872 




SP1225 




SP1647 




SP2044 


SP0067 




SP0457 


SP0873 




SP1232 




SP1648 




SP2050 


SP0060 




SP0459 




sp0881 




SP1243 




SP1651 




SP2053 


SP0075 




SP0483 




SP0894 




SP1244 




SP1654 




SP2056 


SP0079 




SP0494 




SP0899 




SP1274 




SP1672 




SP2060 
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SP0082 




3P0498 


J 


3P0907 




SP1283 




SP1673 




SP2063 


SP0098 




SP0502 




5P0916 




SP1284 




SP1676 




SP2066 


SP0104 




SP0526 


; 


SP0920 




SP1287 




SP1683 




SP2086 


SP01 05+0106 




SP0545 




SP0928 




SP1298 




SP1685/1330 




SP2091 


SP0107 




SP0585 




SP0929 




SP1308 




SP1687 




SP2092 


SP0109 




SP0589 




SP0930 




SP1330/1685 




SP1693 




SP2096 


SP0112 




SP0599 




SP0931 




SP1342 




SP1695 




SP2098 


SP0117 




SP0601 




SP0932 
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